[ 
https://issues.apache.org/jira/browse/PHOENIX-6412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17302238#comment-17302238
 ] 

Lars Hofhansl edited comment on PHOENIX-6412 at 3/16/21, 6:07 AM:
------------------------------------------------------------------

Performancewise when using FAST_DIFF on the main data CF, I see hardly any 
improvement, though.

Looks like RESEEK with FAST_DIFF is hardly any faster than a full SEEK each 
time. Since the data region is local there is no RPC overhead. All the time is 
simply spent in the FAST_DIFF decoder.

I did see a 2x improvement when I switch the block encoding to ROW_INDEX_V1 
(6.5s as opposed to 14s before). Overall, though, this does not seem to be 
worth the effort.

[~kozdemir], FYI. Not what I had expected. But I guess it makes sense.


was (Author: lhofhansl):
Performancewise when using FAST_DIFF on the main data CF, I see hardly any 
improvement, though.

Looks like RESEEK with FAST_DIFF is hardly any faster than a full SEEK each 
time. Since the data region is local there is no RPC overhead. All the time is 
simply spent in the FAST_DIFF decoder.

I did see an improvement when I switch the block encoding to ROW_INDEX_V1. 
Overall, though, this does not seem to be worth the effort.

[~kozdemir], FYI. Not what I had expected. But I guess it makes sense.

> Consider batching uncovered column merge for local indexes
> ----------------------------------------------------------
>
>                 Key: PHOENIX-6412
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-6412
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Lars Hofhansl
>            Priority: Minor
>             Fix For: 5.2.0
>
>         Attachments: 6412-hack.txt
>
>
> Currently uncovered columns are merged row-by-row, performing a Get to the 
> data region for each matching row in the index region.
> Each Get needs to seek all the store scanners, and doing this per row is 
> quite expensive.
> Instead we could batch inside the RegionScannerFactory.getWrappedScanner() -> 
> RegionScanner.nextRaw() method. Collect N index rows and then execute a 
> single skip scan on the data region. 
> I might be able to get to that, but there's someone who is interested in 
> taking this up I would not mind :)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to