Unless there is significant feedback to the contrary, making bloom filters 
true/false and self sizing will be an incompatible change.

During migration, columns which have bloom filters enabled currently will have 
the bloom filter erased and disabled.

Columns may bloom filters only if they are enabled when the column is created.

If a column is later modified to disable the bloom filter, it will be erased 
and cannot be re-enabled.

There is one short term (0.2.0) option for migration and enabling bloom filters 
which will be very expensive: reading the column twice, first to establish the 
number of entries that are needed and second, to create the bloom filter.

There is one long term option: convince Hadoop that MapFile should be 
subclassable which would entail changing private members to protected members, 
or to provide accessors to the private members in MapFile. Because 
hadoop-0.18.0 is in feature freeze, any change of this sort would have to wait 
for hadoop-0.19.0. hbase-0.3.0 will target hadoop-0.18.x, so this change would 
have to wait until hbase-0.4.0.

The question is how many people use bloom filters today? It is our belief that 
they are not particularly useful as implemented. If you do use bloom filters 
today, would you object to a process by which you would create a new bloom 
filter enabled column and copy your data to the new column?

---
Jim Kellerman, Senior Engineer; Powerset

No virus found in this outgoing message.
Checked by AVG.
Version: 8.0.138 / Virus Database: 270.4.6/1538 - Release Date: 7/7/2008 7:40 AM

Reply via email to