Re: Strange performance behavior of SingleValColumnFilter

lars hofhansl Fri, 21 Oct 2011 21:22:45 -0700

No it was a trunk build. The local tests I did with a build from today.
Our test cluster is a 1 or 2 weeks old.


It seems it just much cheaper to scan through block that we already have or 
even scanning into the next block than to reseek.



----- Original Message -----
From: Ted Yu <[email protected]>
To: [email protected]; lars hofhansl <[email protected]>
Cc: 
Sent: Friday, October 21, 2011 8:22 PM
Subject: Re: Strange performance behavior of SingleValColumnFilter

Was the following evaluation performed on 0.92 ?
Also, I assume you use ROWCOL bloom filter.
In TRUNK, Mikhail has put in lazy seek which I think should help
performance.

Cheers

On Fri, Oct 21, 2011 at 7:34 PM, lars hofhansl <[email protected]> wrote:

> We found that even with many columns, and even when the filter matches the
> first column, SKIP is still faster than NEXT_ROW.
> So either the reseek is extremely inefficient, or there is something else
> at play.
>
> It might be worthwhile to have StoreScanner upon SEEK_NEXT_ROW try the next
> N KVs (maybe N=10 or 20 or even bigger) to see if we
> get to the next row, and only if we didn't reach the next row do the
> reseek.
>
> ________________________________
> From: lars hofhansl <[email protected]>
> To: "[email protected]" <[email protected]>; lars hofhansl <
> [email protected]>
> Sent: Friday, October 21, 2011 4:34 PM
> Subject: Re: Strange performance behavior of SingleValColumnFilter
>
> Maybe it even makes sense. When the scan is limited to one column and there
> is only one version, SKIP would skip to the next row.
> But 10x slower for NEXT_ROW seems extreme.
>
>
>
> ________________________________
> From: lars hofhansl <[email protected]>
> To: hbase-dev <[email protected]>
> Sent: Friday, October 21, 2011 3:49 PM
> Subject: Strange performance behavior of SingleValColumnFilter
>
> We have been doing some performance testing on HBase filters. One outcome
> was HBASE-4626 (which I fixed and committed yesterday night).
>
> Now we found a rather strange behavior with SingleColumnValueFilter. On our
> test cluster it is 10x slower than ValueFilter, even when we restrict the
> scan to just the one column we are filtering on and set filterIfMissing to
> true.
> We are not seeing that with HBase in local mode, which points to some
> additional activity on the FS, which in HDFS would be slow compared to a
> local FS.
>
>
> Indeed it turns out the problem goes away when we replace all NEXT_ROW with
> SKIP in SingleColumnValueFilter.filterKeyValue the performance is *much*
> better (on par with ValueFilter).
>
>
> We're using something pretty close to trunk for our tests.
> The tables are pretty wide, only one version of each cells (and freshly
> major compacted).
>
>
> I do not know this part of the code that well (yet) and was wondering if
> somebody could chime in. Maybe this is related to HFileV2?
>
> I do recall there was something done to optimize reseeks. Generally I would
> have expected NEXT_ROW to be a major performance improvement.
>
> Any ideas, comments, pointers?
>
> Thanks.
>
> -- Lars
>

Re: Strange performance behavior of SingleValColumnFilter

Reply via email to