[jira] [Updated] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

ramkrishna.s.vasudevan (JIRA) Mon, 02 Mar 2015 22:20:23 -0800

     [ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


ramkrishna.s.vasudevan updated HBASE-13109:
-------------------------------------------
    Attachment: nextIndexKVChange_new.patch

[~larsh]
I agree on the fact that creating KeyOnlyKeyValue objects is going to have some 
cost. Anyway that is not going to copy the contents of the KV. Only going to 
wrap it.  As in your first patch we were copying the key part so that would 
have been costlier.
But still we have the option of setting the key part to the KeyOnlyKeyValue 
object. So am just attaching a patch to change nextIndexKey to a Cell using 
KeyOnlyKeyValue.  And just keep setting and resetting the nextIndexKey that 
comes from the HFileBlockIndex. 
The main reason I suggest t his way is that for the Offheap changes and BB 
related changes, if we have Cells it would be easier to do these compares 
rather than adding new type of comparators.  Can you try to profile with this 
change and if you really see that KeyOnlyKV is affecting the performance then 
+1 on going ahead with pure byte[] based. What do you think.  It is not a patch 
prepared over your patch, just a suggestion to change  the nextIndexKey to Kv.

> Make better SEEK vs SKIP decisions during scanning
> --------------------------------------------------
>
>                 Key: HBASE-13109
>                 URL: https://issues.apache.org/jira/browse/HBASE-13109
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Priority: Minor
>         Attachments: 13109-trunk-v2.txt, 13109-trunk-v3.txt, 
> 13109-trunk-v4.txt, 13109-trunk.txt, nextIndexKVChange_new.patch
>
>
> I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
> SKIP Cells. This has come up in various issues, and I think I have a way to 
> finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
> --- Old description ---
> This is a continuation of HBASE-9778.
> We've seen a scenario of a very slow scan over a region using a timerange 
> that happens to fall after the ts of any Cell in the region.
> Turns out we spend a lot of time seeking.
> Tested with a 5 column table, and the scan is 5x faster when the timerange 
> falls before all Cells' ts.
> We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
> SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

Reply via email to