Re: Strange performance behavior of SingleColumnValueFilter

Stack Wed, 26 Oct 2011 10:53:55 -0700

Yes.  Should be off by default.
St.Ack


On Wed, Oct 26, 2011 at 10:43 AM, lars hofhansl <[email protected]> wrote:
> Should there be an option to disable data block caching and only allow index 
> block caching?
> For some analytical setups that might make sense.
> (obviously, the same can be achieved by setting cacheBlocks to false in every 
> Scan object)
>
>
>
> ----- Original Message -----
> From: lars hofhansl <[email protected]>
> To: "[email protected]" <[email protected]>; lars hofhansl 
> <[email protected]>
> Cc:
> Sent: Tuesday, October 25, 2011 2:22 PM
> Subject: Re: Strange performance behavior of SingleColumnValueFilter
>
> It turns out that from other tests we did we had a stray
>
>
> <property>
>     <name>hfile.block.cache.size</name>
>     <value>0</value>
> </property>
>
>
> in our config. D'oh...
>
> When we removed that, the performance of SCVF was on par with ValueFilter.
>
> Setting cacheBlocks on the Scan object had almost no affect, so this must be 
> related
> to the caching of Index Blocks.
> NEXT_ROW forces re-reading of Index Blocks it seems, whereas SKIP does not.
>
> So in summary:
> When hfile.block.cache.size=0, returning NEXT_ROW from a ScanQueryMatcher can 
> be significantly slower than returning SKIP.
>
> -- Lars
>
>
> ----- Original Message -----
> From: lars hofhansl <[email protected]>
> To: "[email protected]" <[email protected]>
> Cc:
> Sent: Saturday, October 22, 2011 5:16 PM
> Subject: Re: Strange performance behavior of SingleValColumnFilter
>
> Thanks N.
>
> I do not think the time is lost in the memstore. We're working with fully 
> compacted
> tables and do no updates during the read testing.
>
> We'll be spending more time to track this down on Monday.
>
>
> -- Lars
>
> ________________________________
> From: N Keywal <[email protected]>
> To: [email protected]
> Sent: Saturday, October 22, 2011 2:53 PM
> Subject: Re: Strange performance behavior of SingleValColumnFilter
>
> Hi,
>
> I made a change recently on this. It was to fix a consistency bug rather
> than improve the performances, but on my test the performances were actually
> improved as well. It was for MemStore only. Is the time lost on the memstore
> or in the persisted related part?
>
> Cheers,
>
> N.
>
> On Sat, Oct 22, 2011 at 6:22 AM, lars hofhansl <[email protected]> wrote:
>
>> No it was a trunk build. The local tests I did with a build from today.
>> Our test cluster is a 1 or 2 weeks old.
>>
>> It seems it just much cheaper to scan through block that we already have or
>> even scanning into the next block than to reseek.
>>
>>
>>
>> ----- Original Message -----
>> From: Ted Yu <[email protected]>
>> To: [email protected]; lars hofhansl <[email protected]>
>> Cc:
>> Sent: Friday, October 21, 2011 8:22 PM
>> Subject: Re: Strange performance behavior of SingleValColumnFilter
>>
>> Was the following evaluation performed on 0.92 ?
>> Also, I assume you use ROWCOL bloom filter.
>> In TRUNK, Mikhail has put in lazy seek which I think should help
>> performance.
>>
>> Cheers
>>
>> On Fri, Oct 21, 2011 at 7:34 PM, lars hofhansl <[email protected]>
>> wrote:
>>
>> > We found that even with many columns, and even when the filter matches
>> the
>> > first column, SKIP is still faster than NEXT_ROW.
>> > So either the reseek is extremely inefficient, or there is something else
>> > at play.
>> >
>> > It might be worthwhile to have StoreScanner upon SEEK_NEXT_ROW try the
>> next
>> > N KVs (maybe N=10 or 20 or even bigger) to see if we
>> > get to the next row, and only if we didn't reach the next row do the
>> > reseek.
>> >
>> > ________________________________
>> > From: lars hofhansl <[email protected]>
>> > To: "[email protected]" <[email protected]>; lars hofhansl <
>> > [email protected]>
>> > Sent: Friday, October 21, 2011 4:34 PM
>> > Subject: Re: Strange performance behavior of SingleValColumnFilter
>> >
>> > Maybe it even makes sense. When the scan is limited to one column and
>> there
>> > is only one version, SKIP would skip to the next row.
>> > But 10x slower for NEXT_ROW seems extreme.
>> >
>> >
>> >
>> > ________________________________
>> > From: lars hofhansl <[email protected]>
>> > To: hbase-dev <[email protected]>
>> > Sent: Friday, October 21, 2011 3:49 PM
>> > Subject: Strange performance behavior of SingleValColumnFilter
>> >
>> > We have been doing some performance testing on HBase filters. One outcome
>> > was HBASE-4626 (which I fixed and committed yesterday night).
>> >
>> > Now we found a rather strange behavior with SingleColumnValueFilter. On
>> our
>> > test cluster it is 10x slower than ValueFilter, even when we restrict the
>> > scan to just the one column we are filtering on and set filterIfMissing
>> to
>> > true.
>> > We are not seeing that with HBase in local mode, which points to some
>> > additional activity on the FS, which in HDFS would be slow compared to a
>> > local FS.
>> >
>> >
>> > Indeed it turns out the problem goes away when we replace all NEXT_ROW
>> with
>> > SKIP in SingleColumnValueFilter.filterKeyValue the performance is *much*
>> > better (on par with ValueFilter).
>> >
>> >
>> > We're using something pretty close to trunk for our tests.
>> > The tables are pretty wide, only one version of each cells (and freshly
>> > major compacted).
>> >
>> >
>> > I do not know this part of the code that well (yet) and was wondering if
>> > somebody could chime in. Maybe this is related to HFileV2?
>> >
>> > I do recall there was something done to optimize reseeks. Generally I
>> would
>> > have expected NEXT_ROW to be a major performance improvement.
>> >
>> > Any ideas, comments, pointers?
>> >
>> > Thanks.
>> >
>> > -- Lars
>> >
>>
>>
>

Re: Strange performance behavior of SingleColumnValueFilter

Reply via email to