Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.
The following page has been changed by stack: http://wiki.apache.org/hadoop/Hbase/NewFileFormat ------------------------------------------------------------------------------ If index included offset to every key, would be able to use it to figure if file had an entry for the queried key and every index lookup would get us exact offset. But such an index would be too large to keep in memory (If values are small, file could have many entries. Files are usually about 64MB but can grow to an upper-bound of about 1G though this is configurable and nothing to stop it being configured up from this). + == New Format == + * [https://issues.apache.org/jira/browse/HBASE-647 HBASE-647]: Have data, metadata, indices and bloomfilters, etc., all rolled up in the one file. Could do this with [https://issues.apache.org/jira/browse/HADOOP-3315 TFile]. + == Other File Formats == Cassandra uses a Sequence File. It adds key/values in blocks of 128 by default. On the 128th entry, an index for the block keys is inlined and then a new block begins. Block offsets are kept out in an index file as in MapFile. Bloomfilters are on by default. - == New Format == - Have data, metadata, indices and bloomfilters, etc., all rolled up in the one file. -
