No it was a trunk build. The local tests I did with a build from today. Our test cluster is a 1 or 2 weeks old.
It seems it just much cheaper to scan through block that we already have or even scanning into the next block than to reseek. ----- Original Message ----- From: Ted Yu <[email protected]> To: [email protected]; lars hofhansl <[email protected]> Cc: Sent: Friday, October 21, 2011 8:22 PM Subject: Re: Strange performance behavior of SingleValColumnFilter Was the following evaluation performed on 0.92 ? Also, I assume you use ROWCOL bloom filter. In TRUNK, Mikhail has put in lazy seek which I think should help performance. Cheers On Fri, Oct 21, 2011 at 7:34 PM, lars hofhansl <[email protected]> wrote: > We found that even with many columns, and even when the filter matches the > first column, SKIP is still faster than NEXT_ROW. > So either the reseek is extremely inefficient, or there is something else > at play. > > It might be worthwhile to have StoreScanner upon SEEK_NEXT_ROW try the next > N KVs (maybe N=10 or 20 or even bigger) to see if we > get to the next row, and only if we didn't reach the next row do the > reseek. > > ________________________________ > From: lars hofhansl <[email protected]> > To: "[email protected]" <[email protected]>; lars hofhansl < > [email protected]> > Sent: Friday, October 21, 2011 4:34 PM > Subject: Re: Strange performance behavior of SingleValColumnFilter > > Maybe it even makes sense. When the scan is limited to one column and there > is only one version, SKIP would skip to the next row. > But 10x slower for NEXT_ROW seems extreme. > > > > ________________________________ > From: lars hofhansl <[email protected]> > To: hbase-dev <[email protected]> > Sent: Friday, October 21, 2011 3:49 PM > Subject: Strange performance behavior of SingleValColumnFilter > > We have been doing some performance testing on HBase filters. One outcome > was HBASE-4626 (which I fixed and committed yesterday night). > > Now we found a rather strange behavior with SingleColumnValueFilter. On our > test cluster it is 10x slower than ValueFilter, even when we restrict the > scan to just the one column we are filtering on and set filterIfMissing to > true. > We are not seeing that with HBase in local mode, which points to some > additional activity on the FS, which in HDFS would be slow compared to a > local FS. > > > Indeed it turns out the problem goes away when we replace all NEXT_ROW with > SKIP in SingleColumnValueFilter.filterKeyValue the performance is *much* > better (on par with ValueFilter). > > > We're using something pretty close to trunk for our tests. > The tables are pretty wide, only one version of each cells (and freshly > major compacted). > > > I do not know this part of the code that well (yet) and was wondering if > somebody could chime in. Maybe this is related to HFileV2? > > I do recall there was something done to optimize reseeks. Generally I would > have expected NEXT_ROW to be a major performance improvement. > > Any ideas, comments, pointers? > > Thanks. > > -- Lars >
