[
https://issues.apache.org/jira/browse/PHOENIX-7106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17791467#comment-17791467
]
ASF GitHub Bot commented on PHOENIX-7106:
-----------------------------------------
virajjasani commented on PR #1736:
URL: https://github.com/apache/phoenix/pull/1736#issuecomment-1833138325
Need to resolve merge conflict
> Invalid rowkey returned by coproc can cause data integrity issues
> -----------------------------------------------------------------
>
> Key: PHOENIX-7106
> URL: https://issues.apache.org/jira/browse/PHOENIX-7106
> Project: Phoenix
> Issue Type: Improvement
> Reporter: Viraj Jasani
> Assignee: Viraj Jasani
> Priority: Major
>
> HBase scanner interface expects server to perform scan of the cells from
> HFile or Block cache and return consistent data i.e. rowkey of the cells
> returned should stay in the range of the scan boundaries. When a region moves
> and scanner needs reset, or if the current row is too large and the server
> returns partial row, the subsequent scanner#next is supposed to return
> remaining cells. When this happens, cell rowkeys returned by servers i.e. any
> coprocessors is expected to be in the scan boundary range so that server can
> reliably perform its validation and return remaining cells as expected.
> Phoenix client initiates serial or parallel scans from the aggregators based
> on the region boundaries and the scan boundaries are sometimes adjusted based
> on where optimizer provided key ranges, to include tenant boundaries, salt
> boundaries etc. After the client opens the scanner and performs scan
> operation, some of the coprocs return invalid rowkey for the following cases:
> # Grouped aggregate queries
> # Ungrouped aggregate queries (not all of them)
> # Offset queries
> # Some dummy cells returned with empty rowkey
> # Update statistics queries
> # Local indexes
> Since many of these cases return reserved rowkeys, they are likely not going
> to match scan or region boundaries. It has potential to cause data integrity
> issues in certain scenarios as explained above. Empty rowkey returned by
> server can be treated as end of the region scan by HBase client.
> With the paging feature enabled, if the page size is kept low, we have higher
> chances of scanners returning dummy cell, resulting in increased num of RPC
> calls for better latency and timeouts. We should return only valid rowkey in
> the scan range for all the cases where we perform above mentioned operations
> like complex aggregate or offset queries etc.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)