[
https://issues.apache.org/jira/browse/PHOENIX-6207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17230724#comment-17230724
]
Kadir OZDEMIR commented on PHOENIX-6207:
----------------------------------------
[~comnetwork], What I am trying to achieve at a bigger picture is to eliminate
HBase client timeouts during table region scans. The idea is to break the
operations on the server side into smaller chunks (i.e., pages) such that the
time to process a page will not take long enough to cause timeouts. When I
created Jiras for this (PHOENIX-5998, PHOENIX-6207 and PHOENIX-6211), I was
thinking that a page can be expressed in terms of rows after seeing that
Ungrouped Aggregate Region Observer did not put any limit on the number of rows
(actually it implemented the entire aggregation in doPostScannerOpen()). That's
why these Jiras mention putting limit on the number of rows. While working on
PHOENIX-5998, I observed a better approach is to put a limit on the processing
time of the page, instead of the number of rows to be processed in a page as it
is hard to predict how long it will take to process N rows. I will update the
descriptions for these Jiras.
The latest patches for PHOENIX-5998 and PHOENIX-6207 implement time-based
paging. We also need time-based paging within filters (PHOENIX-6211). Doing
paging at the coproc level cannot put a limit on the time to spend on each
next operation of a region scanner and thus is not sufficient to prevent client
timeouts. Due to delete markers, row versions and very selective filters, a
single next (nextRaw()) operation can take a long time. PHOENIX-6211 is a tough
nut to crack. I will write a design doc to explain the overall approach.
> Paged server side grouped aggregate operations
> ----------------------------------------------
>
> Key: PHOENIX-6207
> URL: https://issues.apache.org/jira/browse/PHOENIX-6207
> Project: Phoenix
> Issue Type: Improvement
> Affects Versions: 5.0.0, 4.14.3
> Reporter: Kadir OZDEMIR
> Assignee: Kadir OZDEMIR
> Priority: Major
> Fix For: 4.16.0
>
> Attachments: PHOENIX-6207.4.x.001.patch, PHOENIX-6207.4.x.002.patch,
> PHOENIX-6207.4.x.003.patch
>
>
> Phoenix provides the option of performing query operations on the client or
> server side. This is decided by the Phoenix optimizer based on configuration
> parameters. For the server side option, the table operation is parallelized
> such that multiple table regions are scanned. However, currently there is no
> paging capability and the server side operation can take long enough lead to
> HBase client timeouts. Putting a limit on the number of rows to be processed
> within a single RPC call (i.e., the next operation on the scanner) on the
> server side using a Phoenix level paging is highly desirable. This paging
> mechanism has been already implemented for index rebuild and verification
> operations and proven to be effective to prevent timeouts. This Jira is for
> implementing this paging for the server side grouped aggregate operations.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)