I want to remove it because I have set it up on the wrong column ;) I should have used NAME => 'a' instead of ='@' ;)
I have setup the kof on the code and redeployed. I have also added the bloom on the right column. I will remove the wrong one later. As soon as the compaction is done I will restart my MR and keep fingers crossed... 2013/1/4, Bryan Beaudreault <[email protected]>: > Why do you want to remove the bloom filter? I think you should keep the > bloom filter but also use the KeyOnlyFilter to cut down on data transferred > over the wire. > > > On Fri, Jan 4, 2013 at 3:28 PM, Jean-Marc Spaggiari > <[email protected] >> wrote: > >> Ok. I have activate them on 2 of my main tables and I will re-run the >> job and see. >> >> 2 other questions then ;) >> >> 1) I have activated them that way: alter 'work_proposed', NAME => '@', >> BLOOMFILTER => 'ROW' how can I remove them? >> 2) Should I major_compact to make sure all the hash are stored? >> >> Thanks, >> >> JM >> >> 2013/1/4, Adrien Mogenet <[email protected]>: >> > On every Get, BloomFilter is acting as a filter (!) on top of each >> > HFile >> > and allows to check if a key is absent from the HFile. So yes, you will >> > benefit from these filters. >> > >> > >> > On Fri, Jan 4, 2013 at 8:58 PM, Jean-Marc Spaggiari >> > <[email protected] >> >> wrote: >> > >> >> Is KeyOnlyFilter using the BloomFilters too? >> >> >> >> Here is, with more details, what I'm doing. >> >> >> >> Few questions. >> >> - Can I create one single KeyOnlyFilter and give the same filter to >> >> all the gets? >> >> - Will bloom filters benefit in such scenario? My key is small. Let's >> >> say average 128 bytes. >> >> >> >> The goal here is to check about 500 entries at a time to validate if >> >> they already exist or not. >> >> >> >> In my MR, I'm starting when I have more than 100K lines to handle, and >> >> each line car have up to 1K entries. So it can result up to 100M >> >> gets... Job took initially 500 minutes to complete. I have added few >> >> pretty good nodes and it's not taking less than 300 minutes. But I >> >> would like to get under 100 minutes if I can... >> >> >> >> Thanks, >> >> >> >> JM >> >> >> >> Vector<Get> gets_entry_exist = new Vector<Get>(); >> >> for (Entry entry : entries.getEntries()) >> >> { >> >> Get entry_exist = new Get(entry.toKey()); >> >> entry_exist.setFilter(new KeyOnlyFilter()); >> >> gets_entry_exist.add(entry_exist); >> >> } >> >> >> >> Result[] result_entry_exist = >> >> table_entry.get(gets_entry_exist); >> >> >> >> int index = 0; >> >> for (Entry entry : entries.getEntries()) >> >> { >> >> boolean isEmpty = >> result_entry_exist[index++].isEmpty(); >> >> if (isEmpty) >> >> { >> >> // Process here >> >> } >> >> } >> >> { >> >> >> >> >> >> 2013/1/4, Damien Hardy <[email protected]>: >> >> > Hello Jean-Marc, >> >> > >> >> > BloomFilters are just designed for that. >> >> > >> >> > But they say if a row doesn't exist with a ash of the key (not the >> >> oposit, >> >> > 2 rowkeys could have the same ash result). >> >> > >> >> > If you want to be sure the rowkey exists you have to search for it >> >> > in >> >> > the >> >> > HFile ( the whole mechanism is transparent with the get() ). >> >> > >> >> > Their is also an KeOnlyFilter >> >> > >> >> >> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/KeyOnlyFilter.html >> >> > preventing from getting the whole columns of the existing key as >> return >> >> > (which could be heavy). >> >> > >> >> > Cheers, >> >> > >> >> > -- >> >> > Damien >> >> > >> >> > >> >> > 2013/1/4 Jean-Marc Spaggiari <[email protected]> >> >> > >> >> >> Hi, >> >> >> >> >> >> What's the fastest way to know if a row exist? >> >> >> >> >> >> Today I'm doing that: >> >> >> >> >> >> Get get_entry_exist = new Get(key).addColumn(CF_DATA, C_DATA); >> >> >> Result entry_exist = table_entry.get(get_entry_exist); >> >> >> >> >> >> But should this be faster? >> >> >> Get get_entry_exist = new Get(key); >> >> >> Result entry_exist = table_entry.get(get_entry_exist); >> >> >> >> >> >> There is only one CF and one C on my table. >> >> >> >> >> >> Or is there an even faster way? >> >> >> >> >> >> Also, is there a way to make that even faster? I think BloomFilters >> >> >> can help, right? >> >> >> >> >> >> Thanks, >> >> >> >> >> >> JM >> >> >> >> >> > >> >> >> > >> > >> > >> > -- >> > Adrien Mogenet >> > 06.59.16.64.22 >> > http://www.mogenet.me >> > >> >
