Thanks. Stack. I will look into the code more as well. Do you think Column only Bloom Filter will help more with this SCAN + explicit columns case and with space saving?
Jerry On Thu, Dec 3, 2015 at 9:01 AM, Stack <[email protected]> wrote: > On Wed, Dec 2, 2015 at 10:01 PM, Jerry He <[email protected]> wrote: > > > Thanks for the response. You got my question correctly. > > If we are scanning the rows one by one and we have the requested column > in > > the column tracker, we have the row+column to look up in the bloom > filter, > > don't we? We may not be able to filter out the file scanners upfront. But > > may at the later time and lower level to skip something? > > > > > <I've not looked at the code>You are right. If more than one explicit > column specified, we could do a bloom check for the second and so on since > we'd have the current row to hand. It could make for a nice speedup for > scans of many explicit columns traversing a dataset that is sparsely > populated.</I've not looked at the code>. > > St.Ack > > > > > Jerry > > > > On Mon, Nov 30, 2015 at 10:55 PM, Stack <[email protected]> wrote: > > > > > On Mon, Nov 30, 2015 at 9:56 AM, Jerry He <[email protected]> wrote: > > > > > > > Hi, experts > > > > > > > > HBASE supports ROWCOL bloom filter. ROW+COL would be the bloom key. > > > > In most of the documentations, it says only GET would benefit. For > > > > multi-column as well. > > > > > > > > If I do scan with StartRow and EndRow, and also specify columns. > > > > Would ROWCOL bloom filter provide any benefit in anyway? > > > > > > > > > > > If I understand your question properly, the answer is no. While we > might > > > have a set of columns to check in the bloom, we'd not know the set of > > rows > > > between start and end row and so would not be able to formulate a query > > > against the ROW+COL bloom filter. > > > > > > St.Ack > > > > > > > > > > > > > Thank you. > > > > > > > > Jerry > > > > > > > > > >
