[Hadoop Wiki] Trivial Update of "Hbase/NewFileFormat" by stack

Apache Wiki Fri, 03 Oct 2008 12:34:14 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.


The following page has been changed by stack:
http://wiki.apache.org/hadoop/Hbase/NewFileFormat

------------------------------------------------------------------------------
  
  If index included offset to every key, would be able to use it to figure if 
file had an entry for the queried key and every index lookup would get us exact 
offset.  But such an index would be too large to keep in memory (If values are 
small, file could have many entries.  Files are usually about 64MB but can grow 
to an upper-bound of about 1G though this is configurable and nothing to stop 
it being configured up from this).
  
+ == New Format ==
+  * [https://issues.apache.org/jira/browse/HBASE-647 HBASE-647]: Have data, 
metadata, indices and bloomfilters, etc., all rolled up in the one file.  Could 
do this with [https://issues.apache.org/jira/browse/HADOOP-3315 TFile].
+ 
  == Other File Formats ==
  Cassandra uses a Sequence File.  It adds key/values in blocks of 128 by 
default.  On the 128th entry, an index for the block keys is inlined and then a 
new block begins.  Block offsets are kept out in an index file as in MapFile.  
Bloomfilters are on by default.
  
- == New Format ==
- Have data, metadata, indices and bloomfilters, etc., all rolled up in the one 
file.
-

[Hadoop Wiki] Trivial Update of "Hbase/NewFileFormat" by stack

Reply via email to