[
https://issues.apache.org/jira/browse/PHOENIX-4932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16640298#comment-16640298
]
Lars Hofhansl commented on PHOENIX-4932:
----------------------------------------
{quote}
Not 100% sure I understand what you mean. The aggregations computed by Phoenix
can be recomputed if this state is detected when a split occurs as the rows are
being traversed (since the row would not have been returned from the
resultSet.next() call). I had a PR out for this, but it got more complicated
than I liked. Maybe we can look at this on a case-by-case basis to see if there
are times when it cannot be redone?
{quote}
I'm trying to find out whether we can teach the client to be generally more
tolerant about HBase restarting scans due to splits (like the scenarios I
listed in the description). That way HBase can do its things and Phoenix still
provides correct results.
(Like PHOENIX-4849, where Phoenix does not care about SPLITs for simple SELECTs
anymore)
> Brainstorm more ways to avoid special SPLIT handling in Phoenix
> ---------------------------------------------------------------
>
> Key: PHOENIX-4932
> URL: https://issues.apache.org/jira/browse/PHOENIX-4932
> Project: Phoenix
> Issue Type: Improvement
> Reporter: Lars Hofhansl
> Priority: Major
>
> Currently Phoenix still requires special handling and retries (automated and
> manually by the client user) when SPLITs occur in HBase.
> PHOENIX-4849 avoids that for "simple" SELECTs. I think we can go further if
> we add a bit more logic to the client like this:
> * Sorts. As we merge sort partial server results from the server scan, start
> a "merge bucket" when we see the next K/V to be out of order (that can happen
> when HBase executes partial scan across the new daughter regions)
> * Aggregates. Make sure the client can deal with more than one result per
> scan. I.e. for a SUM the scanner might return two results if HBase splits the
> scan across two regions. Similarly for AVG, client needs to deal with two
> sets of SUM/COUNT.
> * Offset. Make sure the client applies the offset. The server might return
> more. (this might be more complicated... haven't look too closely)
> In summary: We should let HBase do its things as much as possible. HBase
> already deals with SPLITs, scans are restarted and scan across regions, the
> region cache on the client is invalidated, etc.
> Just parking this here. This is not new. The ideas are probably not new
> either.
> [~tdsilva], FYI.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)