We found that even with many columns, and even when the filter matches the 
first column, SKIP is still faster than NEXT_ROW.
So either the reseek is extremely inefficient, or there is something else at 
play.

It might be worthwhile to have StoreScanner upon SEEK_NEXT_ROW try the next N 
KVs (maybe N=10 or 20 or even bigger) to see if we
get to the next row, and only if we didn't reach the next row do the reseek.

________________________________
From: lars hofhansl <[email protected]>
To: "[email protected]" <[email protected]>; lars hofhansl 
<[email protected]>
Sent: Friday, October 21, 2011 4:34 PM
Subject: Re: Strange performance behavior of SingleValColumnFilter

Maybe it even makes sense. When the scan is limited to one column and there is 
only one version, SKIP would skip to the next row.
But 10x slower for NEXT_ROW seems extreme.



________________________________
From: lars hofhansl <[email protected]>
To: hbase-dev <[email protected]>
Sent: Friday, October 21, 2011 3:49 PM
Subject: Strange performance behavior of SingleValColumnFilter

We have been doing some performance testing on HBase filters. One outcome was 
HBASE-4626 (which I fixed and committed yesterday night).

Now we found a rather strange behavior with SingleColumnValueFilter. On our 
test cluster it is 10x slower than ValueFilter, even when we restrict the scan 
to just the one column we are filtering on and set filterIfMissing to true.
We are not seeing that with HBase in local mode, which points to some 
additional activity on the FS, which in HDFS would be slow compared to a local 
FS.


Indeed it turns out the problem goes away when we replace all NEXT_ROW with 
SKIP in SingleColumnValueFilter.filterKeyValue the performance is *much* better 
(on par with ValueFilter).


We're using something pretty close to trunk for our tests.
The tables are pretty wide, only one version of each cells (and freshly major 
compacted).


I do not know this part of the code that well (yet) and was wondering if 
somebody could chime in. Maybe this is related to HFileV2?

I do recall there was something done to optimize reseeks. Generally I would 
have expected NEXT_ROW to be a major performance improvement.

Any ideas, comments, pointers?

Thanks.

-- Lars

Reply via email to