[jira] [Commented] (PHOENIX-5577) Uncovered columns are retrieved one-by-one in local index scans.

Lars Hofhansl (Jira) Sun, 17 Nov 2019 16:17:22 -0800


    [ 
https://issues.apache.org/jira/browse/PHOENIX-5577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16976207#comment-16976207
 ]


Lars Hofhansl commented on PHOENIX-5577:
----------------------------------------

Looked into various way to solve, but no good solution emerged, yet. This is 
not logically hard to do, just hard to map into the current RegionScanner and 
RegionObserver framework.

Options:
* Use the RegionObserver.postScannerNext(...) hook to fill in the missing 
columns. The complication there is that the original Scan object is no longer 
known, and also that the merging scanner might have been wrapped multiple 
times, making it hard to know what to do.
* Invent a Lazy or Deferred subclass or RegionScanner, which records what rows 
still have columns to be merged in, but doesn't do the merging until the 
columns are actually needed, at which point it can then do it for all rows 
recorded so far.
* more...?

In either case this would collect rows that need columns merged and then do it 
with a single region local SkipScan.


> Uncovered columns are retrieved one-by-one in local index scans.
> ----------------------------------------------------------------
>
>                 Key: PHOENIX-5577
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-5577
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Lars Hofhansl
>            Priority: Major
>              Labels: performance
>
> One of the strengths of local indexes is that they are the only indexes that 
> work when not all columns needed for a query are covered by (i.e. copied 
> into) the index, allowing them to be *much* smaller. However the merging of 
> the missing columns is done one-by-one per row.
> See RegionScannerFactory.getWrappedScanner(...) -> new 
> RegionScanner(...).nextRaw(...) -> IndexUtil.wrapResultUsingOffset(...)
> For index scans this issues a Get back to the same region for each single 
> scanned row. While the Get is local, it still needs to setup a scanner and 
> seek to the right key each time. This is pretty inefficient. Local indexes 
> could be much, much faster at read time for larger scans. This should use a 
> SkipScan instead for a batch of scanned keys.
> (This is mitigated some by setting the block encoding to ROW_INDEX_V1, but 
> still less than ideal.)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (PHOENIX-5577) Uncovered columns are retrieved one-by-one in local index scans.

Reply via email to