[ 
https://issues.apache.org/jira/browse/PHOENIX-6412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17302890#comment-17302890
 ] 

Lars Hofhansl commented on PHOENIX-6412:
----------------------------------------

[~kadir] Missed you earlier comment. I agree somewhat.

Consider the following:
 # If you want preserve cache capacity you would have instructs the scan to to 
cache blocks. Usually all Gets and Scans cache all the blocks they touch. Batch 
Gets are partitioned by region server on the client, so it would be unlikely 
that the block would be unloaded from the cache between the batched Gets.
 # For this to really play out we'd need to expect that HBase block caching 
*and* the OS disk caching is ineffective.
 # The caching only helps if we expect to land multiple seeks into the same 
block. In the general case and with default 64K blocks, I believe that to be 
unlikely - although I can of course construct scenarios where this is the case.
 # In these scenarios we get by far the biggest bang for the buck when 
ROW_INDEX_V1 as block encoding, as it makes seeking a O(log(n)) type of 
operation (once the block is loaded that is). Generally I have come to believe 
that FAST_DIFF is bad default choice.

For this. I'm not sure the added complexity is justified by the outcome.

 

> Consider batching uncovered column merge for local indexes
> ----------------------------------------------------------
>
>                 Key: PHOENIX-6412
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-6412
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Lars Hofhansl
>            Priority: Minor
>             Fix For: 5.2.0
>
>         Attachments: 6412-hack.txt
>
>
> Currently uncovered columns are merged row-by-row, performing a Get to the 
> data region for each matching row in the index region.
> Each Get needs to seek all the store scanners, and doing this per row is 
> quite expensive.
> Instead we could batch inside the RegionScannerFactory.getWrappedScanner() -> 
> RegionScanner.nextRaw() method. Collect N index rows and then execute a 
> single skip scan on the data region. 
> I might be able to get to that, but there's someone who is interested in 
> taking this up I would not mind :)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to