What about HTable.exists ?? http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#exists(org.apache.hadoop.hbase.client.Get)
I think that should work if the Get has only the row key. Mohamed On Fri, Jan 4, 2013 at 3:17 PM, Adrien Mogenet <[email protected]>wrote: > On every Get, BloomFilter is acting as a filter (!) on top of each HFile > and allows to check if a key is absent from the HFile. So yes, you will > benefit from these filters. > > > On Fri, Jan 4, 2013 at 8:58 PM, Jean-Marc Spaggiari < > [email protected] > > wrote: > > > Is KeyOnlyFilter using the BloomFilters too? > > > > Here is, with more details, what I'm doing. > > > > Few questions. > > - Can I create one single KeyOnlyFilter and give the same filter to > > all the gets? > > - Will bloom filters benefit in such scenario? My key is small. Let's > > say average 128 bytes. > > > > The goal here is to check about 500 entries at a time to validate if > > they already exist or not. > > > > In my MR, I'm starting when I have more than 100K lines to handle, and > > each line car have up to 1K entries. So it can result up to 100M > > gets... Job took initially 500 minutes to complete. I have added few > > pretty good nodes and it's not taking less than 300 minutes. But I > > would like to get under 100 minutes if I can... > > > > Thanks, > > > > JM > > > > Vector<Get> gets_entry_exist = new Vector<Get>(); > > for (Entry entry : entries.getEntries()) > > { > > Get entry_exist = new Get(entry.toKey()); > > entry_exist.setFilter(new KeyOnlyFilter()); > > gets_entry_exist.add(entry_exist); > > } > > > > Result[] result_entry_exist = table_entry.get(gets_entry_exist); > > > > int index = 0; > > for (Entry entry : entries.getEntries()) > > { > > boolean isEmpty = result_entry_exist[index++].isEmpty(); > > if (isEmpty) > > { > > // Process here > > } > > } > > { > > > > > > 2013/1/4, Damien Hardy <[email protected]>: > > > Hello Jean-Marc, > > > > > > BloomFilters are just designed for that. > > > > > > But they say if a row doesn't exist with a ash of the key (not the > > oposit, > > > 2 rowkeys could have the same ash result). > > > > > > If you want to be sure the rowkey exists you have to search for it in > the > > > HFile ( the whole mechanism is transparent with the get() ). > > > > > > Their is also an KeOnlyFilter > > > > > > http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/KeyOnlyFilter.html > > > preventing from getting the whole columns of the existing key as return > > > (which could be heavy). > > > > > > Cheers, > > > > > > -- > > > Damien > > > > > > > > > 2013/1/4 Jean-Marc Spaggiari <[email protected]> > > > > > >> Hi, > > >> > > >> What's the fastest way to know if a row exist? > > >> > > >> Today I'm doing that: > > >> > > >> Get get_entry_exist = new Get(key).addColumn(CF_DATA, C_DATA); > > >> Result entry_exist = table_entry.get(get_entry_exist); > > >> > > >> But should this be faster? > > >> Get get_entry_exist = new Get(key); > > >> Result entry_exist = table_entry.get(get_entry_exist); > > >> > > >> There is only one CF and one C on my table. > > >> > > >> Or is there an even faster way? > > >> > > >> Also, is there a way to make that even faster? I think BloomFilters > > >> can help, right? > > >> > > >> Thanks, > > >> > > >> JM > > >> > > > > > > > > > -- > Adrien Mogenet > 06.59.16.64.22 > http://www.mogenet.me >
