[ 
https://issues.apache.org/jira/browse/PHOENIX-6412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17302858#comment-17302858
 ] 

Lars Hofhansl commented on PHOENIX-6412:
----------------------------------------

I ran this through the profiler and confirmed what I found. With FAST_DIFF 
there is little different between SEEK and RESEEK, especially when the RESEEK 
is "far enough" for reach another seed Cell (which is almost always the case 
with random seeks). So a skip scan has no advantage as compared to many Get 
requests.

With ROW_INDEX_V1 RESEEK is much faster than SEEK.

Since each Get requires SEEK'ing all the involved scanners each time, we see 
the difference issuing many Gets as opposed to a single skip scan, which SEEKs 
once following by many RESEEKs.

 

> Consider batching uncovered column merge for local indexes
> ----------------------------------------------------------
>
>                 Key: PHOENIX-6412
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-6412
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Lars Hofhansl
>            Priority: Minor
>             Fix For: 5.2.0
>
>         Attachments: 6412-hack.txt
>
>
> Currently uncovered columns are merged row-by-row, performing a Get to the 
> data region for each matching row in the index region.
> Each Get needs to seek all the store scanners, and doing this per row is 
> quite expensive.
> Instead we could batch inside the RegionScannerFactory.getWrappedScanner() -> 
> RegionScanner.nextRaw() method. Collect N index rows and then execute a 
> single skip scan on the data region. 
> I might be able to get to that, but there's someone who is interested in 
> taking this up I would not mind :)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to