Unless there is significant feedback to the contrary, making bloom filters true/false and self sizing will be an incompatible change.
During migration, columns which have bloom filters enabled currently will have the bloom filter erased and disabled. Columns may bloom filters only if they are enabled when the column is created. If a column is later modified to disable the bloom filter, it will be erased and cannot be re-enabled. There is one short term (0.2.0) option for migration and enabling bloom filters which will be very expensive: reading the column twice, first to establish the number of entries that are needed and second, to create the bloom filter. There is one long term option: convince Hadoop that MapFile should be subclassable which would entail changing private members to protected members, or to provide accessors to the private members in MapFile. Because hadoop-0.18.0 is in feature freeze, any change of this sort would have to wait for hadoop-0.19.0. hbase-0.3.0 will target hadoop-0.18.x, so this change would have to wait until hbase-0.4.0. The question is how many people use bloom filters today? It is our belief that they are not particularly useful as implemented. If you do use bloom filters today, would you object to a process by which you would create a new bloom filter enabled column and copy your data to the new column? --- Jim Kellerman, Senior Engineer; Powerset No virus found in this outgoing message. Checked by AVG. Version: 8.0.138 / Virus Database: 270.4.6/1538 - Release Date: 7/7/2008 7:40 AM
