Yes. Should be off by default. St.Ack
On Wed, Oct 26, 2011 at 10:43 AM, lars hofhansl <[email protected]> wrote: > Should there be an option to disable data block caching and only allow index > block caching? > For some analytical setups that might make sense. > (obviously, the same can be achieved by setting cacheBlocks to false in every > Scan object) > > > > ----- Original Message ----- > From: lars hofhansl <[email protected]> > To: "[email protected]" <[email protected]>; lars hofhansl > <[email protected]> > Cc: > Sent: Tuesday, October 25, 2011 2:22 PM > Subject: Re: Strange performance behavior of SingleColumnValueFilter > > It turns out that from other tests we did we had a stray > > > <property> > <name>hfile.block.cache.size</name> > <value>0</value> > </property> > > > in our config. D'oh... > > When we removed that, the performance of SCVF was on par with ValueFilter. > > Setting cacheBlocks on the Scan object had almost no affect, so this must be > related > to the caching of Index Blocks. > NEXT_ROW forces re-reading of Index Blocks it seems, whereas SKIP does not. > > So in summary: > When hfile.block.cache.size=0, returning NEXT_ROW from a ScanQueryMatcher can > be significantly slower than returning SKIP. > > -- Lars > > > ----- Original Message ----- > From: lars hofhansl <[email protected]> > To: "[email protected]" <[email protected]> > Cc: > Sent: Saturday, October 22, 2011 5:16 PM > Subject: Re: Strange performance behavior of SingleValColumnFilter > > Thanks N. > > I do not think the time is lost in the memstore. We're working with fully > compacted > tables and do no updates during the read testing. > > We'll be spending more time to track this down on Monday. > > > -- Lars > > ________________________________ > From: N Keywal <[email protected]> > To: [email protected] > Sent: Saturday, October 22, 2011 2:53 PM > Subject: Re: Strange performance behavior of SingleValColumnFilter > > Hi, > > I made a change recently on this. It was to fix a consistency bug rather > than improve the performances, but on my test the performances were actually > improved as well. It was for MemStore only. Is the time lost on the memstore > or in the persisted related part? > > Cheers, > > N. > > On Sat, Oct 22, 2011 at 6:22 AM, lars hofhansl <[email protected]> wrote: > >> No it was a trunk build. The local tests I did with a build from today. >> Our test cluster is a 1 or 2 weeks old. >> >> It seems it just much cheaper to scan through block that we already have or >> even scanning into the next block than to reseek. >> >> >> >> ----- Original Message ----- >> From: Ted Yu <[email protected]> >> To: [email protected]; lars hofhansl <[email protected]> >> Cc: >> Sent: Friday, October 21, 2011 8:22 PM >> Subject: Re: Strange performance behavior of SingleValColumnFilter >> >> Was the following evaluation performed on 0.92 ? >> Also, I assume you use ROWCOL bloom filter. >> In TRUNK, Mikhail has put in lazy seek which I think should help >> performance. >> >> Cheers >> >> On Fri, Oct 21, 2011 at 7:34 PM, lars hofhansl <[email protected]> >> wrote: >> >> > We found that even with many columns, and even when the filter matches >> the >> > first column, SKIP is still faster than NEXT_ROW. >> > So either the reseek is extremely inefficient, or there is something else >> > at play. >> > >> > It might be worthwhile to have StoreScanner upon SEEK_NEXT_ROW try the >> next >> > N KVs (maybe N=10 or 20 or even bigger) to see if we >> > get to the next row, and only if we didn't reach the next row do the >> > reseek. >> > >> > ________________________________ >> > From: lars hofhansl <[email protected]> >> > To: "[email protected]" <[email protected]>; lars hofhansl < >> > [email protected]> >> > Sent: Friday, October 21, 2011 4:34 PM >> > Subject: Re: Strange performance behavior of SingleValColumnFilter >> > >> > Maybe it even makes sense. When the scan is limited to one column and >> there >> > is only one version, SKIP would skip to the next row. >> > But 10x slower for NEXT_ROW seems extreme. >> > >> > >> > >> > ________________________________ >> > From: lars hofhansl <[email protected]> >> > To: hbase-dev <[email protected]> >> > Sent: Friday, October 21, 2011 3:49 PM >> > Subject: Strange performance behavior of SingleValColumnFilter >> > >> > We have been doing some performance testing on HBase filters. One outcome >> > was HBASE-4626 (which I fixed and committed yesterday night). >> > >> > Now we found a rather strange behavior with SingleColumnValueFilter. On >> our >> > test cluster it is 10x slower than ValueFilter, even when we restrict the >> > scan to just the one column we are filtering on and set filterIfMissing >> to >> > true. >> > We are not seeing that with HBase in local mode, which points to some >> > additional activity on the FS, which in HDFS would be slow compared to a >> > local FS. >> > >> > >> > Indeed it turns out the problem goes away when we replace all NEXT_ROW >> with >> > SKIP in SingleColumnValueFilter.filterKeyValue the performance is *much* >> > better (on par with ValueFilter). >> > >> > >> > We're using something pretty close to trunk for our tests. >> > The tables are pretty wide, only one version of each cells (and freshly >> > major compacted). >> > >> > >> > I do not know this part of the code that well (yet) and was wondering if >> > somebody could chime in. Maybe this is related to HFileV2? >> > >> > I do recall there was something done to optimize reseeks. Generally I >> would >> > have expected NEXT_ROW to be a major performance improvement. >> > >> > Any ideas, comments, pointers? >> > >> > Thanks. >> > >> > -- Lars >> > >> >> >
