Re: Would ROWCOL Bloom filter help in Scan

Jerry He Sat, 05 Dec 2015 12:55:36 -0800

>
> It has to have a row on it, right?


Only the column name is the key in the bloom.  For explicit columnar scan
only.
StoreFileScanner can be skipped after this bloom check.
Only a high level thinking here.  No?  It does't work this way?  I must
miss something then.



> And how do we get space savings?



The number of columns would be much less than the ROW+COL


> There is a bloom at the start of every row already, to speed deletes. IIRC,
> we always read this first before we do anything. Perhaps we could beef it
> up with more than just delete?
>

Have seen something like that in the code. Still trying to better
understand it.



>
> St.Ack
>
>
>
> > Jerry
> >
> > On Thu, Dec 3, 2015 at 9:01 AM, Stack <[email protected]> wrote:
> >
> > > On Wed, Dec 2, 2015 at 10:01 PM, Jerry He <[email protected]> wrote:
> > >
> > > > Thanks for the response.  You got my question correctly.
> > > > If we are scanning the rows one by one and we have the requested
> column
> > > in
> > > > the column tracker, we have the row+column to look up in the bloom
> > > filter,
> > > > don't we? We may not be able to filter out the file scanners upfront.
> > But
> > > > may at the later time and lower level to skip something?
> > > >
> > > >
> > > <I've not looked at the code>You are right. If more than one explicit
> > > column specified, we could do a bloom check for the second and so on
> > since
> > > we'd have the current row to hand. It could make for a nice speedup for
> > > scans of many explicit columns traversing a dataset that is sparsely
> > > populated.</I've not looked at the code>.
> > >
> > > St.Ack
> > >
> > >
> > >
> > > > Jerry
> > > >
> > > > On Mon, Nov 30, 2015 at 10:55 PM, Stack <[email protected]> wrote:
> > > >
> > > > > On Mon, Nov 30, 2015 at 9:56 AM, Jerry He <[email protected]>
> > wrote:
> > > > >
> > > > > > Hi, experts
> > > > > >
> > > > > > HBASE supports ROWCOL bloom filter. ROW+COL would be the bloom
> key.
> > > > > > In most of the documentations, it says only GET would benefit.
> For
> > > > > > multi-column as well.
> > > > > >
> > > > > > If I do scan with StartRow and EndRow, and also specify columns.
> > > > > > Would ROWCOL bloom filter provide any benefit in anyway?
> > > > > >
> > > > > >
> > > > > If I understand your question properly, the answer is no. While we
> > > might
> > > > > have a set of columns to check in the bloom, we'd not know the set
> of
> > > > rows
> > > > > between start and end row and so would not be able to formulate a
> > query
> > > > > against the ROW+COL bloom filter.
> > > > >
> > > > > St.Ack
> > > > >
> > > > >
> > > > >
> > > > > > Thank you.
> > > > > >
> > > > > > Jerry
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Would ROWCOL Bloom filter help in Scan

Reply via email to