[
https://issues.apache.org/jira/browse/PHOENIX-6412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17302742#comment-17302742
]
Kadir Ozdemir commented on PHOENIX-6412:
----------------------------------------
I think the skip scan would help in the cases where the data table rows cannot
be cached effectively and the data table rows to be visited are not uniformly
distributed but still not accessed in the row key order. Assume that we batch
8K rows at a time, but these rows are stored in 1K blocks in random order. With
the skip scan we would load 1K blocks to the memory. Without the skip scan, we
can end up with loading 8K blocks (i.e., the same block is loaded 8 times) if
these blocks cannot be cached.
It will not be easy to create this case but I believe it will be common enough
case to see in real life scenarios.
> Consider batching uncovered column merge for local indexes
> ----------------------------------------------------------
>
> Key: PHOENIX-6412
> URL: https://issues.apache.org/jira/browse/PHOENIX-6412
> Project: Phoenix
> Issue Type: Improvement
> Reporter: Lars Hofhansl
> Priority: Minor
> Fix For: 5.2.0
>
> Attachments: 6412-hack.txt
>
>
> Currently uncovered columns are merged row-by-row, performing a Get to the
> data region for each matching row in the index region.
> Each Get needs to seek all the store scanners, and doing this per row is
> quite expensive.
> Instead we could batch inside the RegionScannerFactory.getWrappedScanner() ->
> RegionScanner.nextRaw() method. Collect N index rows and then execute a
> single skip scan on the data region.
> I might be able to get to that, but there's someone who is interested in
> taking this up I would not mind :)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)