[
https://issues.apache.org/jira/browse/PHOENIX-5577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16976207#comment-16976207
]
Lars Hofhansl commented on PHOENIX-5577:
----------------------------------------
Looked into various way to solve, but no good solution emerged, yet. This is
not logically hard to do, just hard to map into the current RegionScanner and
RegionObserver framework.
Options:
* Use the RegionObserver.postScannerNext(...) hook to fill in the missing
columns. The complication there is that the original Scan object is no longer
known, and also that the merging scanner might have been wrapped multiple
times, making it hard to know what to do.
* Invent a Lazy or Deferred subclass or RegionScanner, which records what rows
still have columns to be merged in, but doesn't do the merging until the
columns are actually needed, at which point it can then do it for all rows
recorded so far.
* more...?
In either case this would collect rows that need columns merged and then do it
with a single region local SkipScan.
> Uncovered columns are retrieved one-by-one in local index scans.
> ----------------------------------------------------------------
>
> Key: PHOENIX-5577
> URL: https://issues.apache.org/jira/browse/PHOENIX-5577
> Project: Phoenix
> Issue Type: Improvement
> Reporter: Lars Hofhansl
> Priority: Major
> Labels: performance
>
> One of the strengths of local indexes is that they are the only indexes that
> work when not all columns needed for a query are covered by (i.e. copied
> into) the index, allowing them to be *much* smaller. However the merging of
> the missing columns is done one-by-one per row.
> See RegionScannerFactory.getWrappedScanner(...) -> new
> RegionScanner(...).nextRaw(...) -> IndexUtil.wrapResultUsingOffset(...)
> For index scans this issues a Get back to the same region for each single
> scanned row. While the Get is local, it still needs to setup a scanner and
> seek to the right key each time. This is pretty inefficient. Local indexes
> could be much, much faster at read time for larger scans. This should use a
> SkipScan instead for a batch of scanned keys.
> (This is mitigated some by setting the block encoding to ROW_INDEX_V1, but
> still less than ideal.)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)