[ https://issues.apache.org/jira/browse/PHOENIX-7106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Viraj Jasani updated PHOENIX-7106: ---------------------------------- Priority: Critical (was: Major) > Invalid rowkey returned by coproc can cause data integrity issues > ----------------------------------------------------------------- > > Key: PHOENIX-7106 > URL: https://issues.apache.org/jira/browse/PHOENIX-7106 > Project: Phoenix > Issue Type: Improvement > Reporter: Viraj Jasani > Assignee: Viraj Jasani > Priority: Critical > Fix For: 5.2.0, 5.1.4 > > > HBase scanner interface expects server to perform scan of the cells from > HFile or Block cache and return consistent data i.e. rowkey of the cells > returned should stay in the range of the scan boundaries. When a region moves > and scanner needs reset, or if the current row is too large and the server > returns partial row, the subsequent scanner#next is supposed to return > remaining cells. When this happens, cell rowkeys returned by servers i.e. any > coprocessors is expected to be in the scan boundary range so that server can > reliably perform its validation and return remaining cells as expected. > Phoenix client initiates serial or parallel scans from the aggregators based > on the region boundaries and the scan boundaries are sometimes adjusted based > on where optimizer provided key ranges, to include tenant boundaries, salt > boundaries etc. After the client opens the scanner and performs scan > operation, some of the coprocs return invalid rowkey for the following cases: > # Grouped aggregate queries > # Ungrouped aggregate queries (not all of them) > # Offset queries > # Some dummy cells returned with empty rowkey > # Update statistics queries > # Local indexes > Since many of these cases return reserved rowkeys, they are likely not going > to match scan or region boundaries. It has potential to cause data integrity > issues in certain scenarios as explained above. Empty rowkey returned by > server can be treated as end of the region scan by HBase client. > With the paging feature enabled, if the page size is kept low, we have higher > chances of scanners returning dummy cell, resulting in increased num of RPC > calls for better latency and timeouts. We should return only valid rowkey in > the scan range for all the cases where we perform above mentioned operations > like complex aggregate or offset queries etc. -- This message was sent by Atlassian Jira (v8.20.10#820010)