Was the following evaluation performed on 0.92 ? Also, I assume you use ROWCOL bloom filter. In TRUNK, Mikhail has put in lazy seek which I think should help performance.
Cheers On Fri, Oct 21, 2011 at 7:34 PM, lars hofhansl <[email protected]> wrote: > We found that even with many columns, and even when the filter matches the > first column, SKIP is still faster than NEXT_ROW. > So either the reseek is extremely inefficient, or there is something else > at play. > > It might be worthwhile to have StoreScanner upon SEEK_NEXT_ROW try the next > N KVs (maybe N=10 or 20 or even bigger) to see if we > get to the next row, and only if we didn't reach the next row do the > reseek. > > ________________________________ > From: lars hofhansl <[email protected]> > To: "[email protected]" <[email protected]>; lars hofhansl < > [email protected]> > Sent: Friday, October 21, 2011 4:34 PM > Subject: Re: Strange performance behavior of SingleValColumnFilter > > Maybe it even makes sense. When the scan is limited to one column and there > is only one version, SKIP would skip to the next row. > But 10x slower for NEXT_ROW seems extreme. > > > > ________________________________ > From: lars hofhansl <[email protected]> > To: hbase-dev <[email protected]> > Sent: Friday, October 21, 2011 3:49 PM > Subject: Strange performance behavior of SingleValColumnFilter > > We have been doing some performance testing on HBase filters. One outcome > was HBASE-4626 (which I fixed and committed yesterday night). > > Now we found a rather strange behavior with SingleColumnValueFilter. On our > test cluster it is 10x slower than ValueFilter, even when we restrict the > scan to just the one column we are filtering on and set filterIfMissing to > true. > We are not seeing that with HBase in local mode, which points to some > additional activity on the FS, which in HDFS would be slow compared to a > local FS. > > > Indeed it turns out the problem goes away when we replace all NEXT_ROW with > SKIP in SingleColumnValueFilter.filterKeyValue the performance is *much* > better (on par with ValueFilter). > > > We're using something pretty close to trunk for our tests. > The tables are pretty wide, only one version of each cells (and freshly > major compacted). > > > I do not know this part of the code that well (yet) and was wondering if > somebody could chime in. Maybe this is related to HFileV2? > > I do recall there was something done to optimize reseeks. Generally I would > have expected NEXT_ROW to be a major performance improvement. > > Any ideas, comments, pointers? > > Thanks. > > -- Lars >
