[ 
https://issues.apache.org/jira/browse/PHOENIX-4932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16640298#comment-16640298
 ] 

Lars Hofhansl commented on PHOENIX-4932:
----------------------------------------

{quote}
Not 100% sure I understand what you mean. The aggregations computed by Phoenix 
can be recomputed if this state is detected when a split occurs as the rows are 
being traversed (since the row would not have been returned from the 
resultSet.next() call). I had a PR out for this, but it got more complicated 
than I liked. Maybe we can look at this on a case-by-case basis to see if there 
are times when it cannot be redone?
{quote}

I'm trying to find out whether we can teach the client to be generally more 
tolerant about HBase restarting scans due to splits (like the scenarios I 
listed in the description). That way HBase can do its things and Phoenix still 
provides correct results.
(Like PHOENIX-4849, where Phoenix does not care about SPLITs for simple SELECTs 
anymore)

> Brainstorm more ways to avoid special SPLIT handling in Phoenix
> ---------------------------------------------------------------
>
>                 Key: PHOENIX-4932
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4932
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Lars Hofhansl
>            Priority: Major
>
> Currently Phoenix still requires special handling and retries (automated and 
> manually by the client user) when SPLITs occur in HBase.
> PHOENIX-4849 avoids that for "simple" SELECTs. I think we can go further if 
> we add a bit more logic to the client like this:
>  * Sorts. As we merge sort partial server results from the server scan, start 
> a "merge bucket" when we see the next K/V to be out of order (that can happen 
> when HBase executes partial scan across the new daughter regions)
>  * Aggregates. Make sure the client can deal with more than one result per 
> scan. I.e. for a SUM the scanner might return two results if HBase splits the 
> scan across two regions. Similarly for AVG, client needs to deal with two 
> sets of SUM/COUNT.
>  * Offset. Make sure the client applies the offset. The server might return 
> more. (this might be more complicated... haven't look too closely)
> In summary: We should let HBase do its things as much as possible. HBase 
> already deals with SPLITs, scans are restarted and scan across regions, the 
> region cache on the client is invalidated, etc.
> Just parking this here. This is not new. The ideas are probably not new 
> either.
> [~tdsilva], FYI.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to