[ 
https://issues.apache.org/jira/browse/HBASE-17958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16719318#comment-16719318
 ] 

Lars Hofhansl edited comment on HBASE-17958 at 12/12/18 7:12 PM:
-----------------------------------------------------------------

Looking at this one again... Since it popped up in the profiler again.
I tried moving the check into StoreFileScanner or HFileScanner, but much of the 
cost unfortunately is spent on the higher level (KeyValueHeap mostly).

I wanted to come back to the discussion about how often we need to check the 
next indexed key.
While it is true that key *may* change during heap.next(), this is just a 
heuristic based on the key we're looking to have an estimate whether seeking or 
skipping would be more effective.
Right now we're always paying the cost of an extra compare per K/V  to guard 
against the rare case when the scanner switches *and* that new new scanner has 
many versions. 

So I propose again moving that compare out of the loop, and only check once, 
it's good enough for a heuristic, and not needed for correctness, and in the 
the case I'm seeing this compare represents 40% of the time spent in 
StoreScanner.next().

In gist: This is a heuristic to try to guess whether SKIP or SEEK is better. It 
only has to be mostly right. I'll file a separate Jira.

[~Apache9], [~zghaobac]


was (Author: lhofhansl):
Looking at this one again... Since it popped up in the profiler again.
I tried moving the check into StoreFileScanner or HFileScanner, but much of the 
cost unfortunately is spent on the higher level (KeyValueHeap mostly).

I wanted to come back to the discussion about how often we need to check the 
next indexed key.
While it is true that key *may* change during heap.next(), this is just a 
heuristic based on the key we're looking to have an estimate whether seeking or 
skipping would be more effective.
Right now we're always paying the cost of an extra compare per K/V  to guard 
against the rare case when the scanner switches *and* that new new scanner has 
many versions. 

So I propose again moving that compare out of the loop, and only check once, 
it's good enough for a heuristic, and not needed for correctness, and in the 
the case I'm seeing this compare represents 40% of the time spent in 
StoreScanner.next().

In gist: This is a heuristic to try to guess whether SKIP or SEEK is better. It 
only has to be mostly right. I'll file a separate Jira.

> Avoid passing unexpected cell to ScanQueryMatcher when optimize SEEK to SKIP
> ----------------------------------------------------------------------------
>
>                 Key: HBASE-17958
>                 URL: https://issues.apache.org/jira/browse/HBASE-17958
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Guanghao Zhang
>            Assignee: Guanghao Zhang
>            Priority: Major
>             Fix For: 1.4.0, 2.0.0
>
>         Attachments: 0001-add-one-ut-testWithColumnCountGetFilter.patch, 
> 17958-add.txt, HBASE-17958-branch-1.patch, HBASE-17958-branch-1.patch, 
> HBASE-17958-branch-1.patch, HBASE-17958-branch-1.patch, HBASE-17958-v1.patch, 
> HBASE-17958-v2.patch, HBASE-17958-v3.patch, HBASE-17958-v4.patch, 
> HBASE-17958-v5.patch, HBASE-17958-v6.patch, HBASE-17958-v7.patch, 
> HBASE-17958-v7.patch
>
>
> {code}
> ScanQueryMatcher.MatchCode qcode = matcher.match(cell);
> qcode = optimize(qcode, cell);
> {code}
> The optimize method may change the MatchCode from SEEK_NEXT_COL/SEEK_NEXT_ROW 
> to SKIP. But it still pass the next cell to ScanQueryMatcher. It will get 
> wrong result when use some filter, etc. ColumnCountGetFilter. It just count 
> the  columns's number. If pass a same column to this filter, the count result 
> will be wrong. So we should avoid passing cell to ScanQueryMatcher when 
> optimize SEEK to SKIP.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to