SequenceFile should compress blocks, not individual entries
-----------------------------------------------------------
Key: HADOOP-54
URL: http://issues.apache.org/jira/browse/HADOOP-54
Project: Hadoop
Type: Improvement
Components: io
Versions: 0.2
Reporter: Doug Cutting
Assigned to: Michel Tourn
Fix For: 0.2
SequenceFile will optionally compress individual values. But both compression
and performance would be much better if sequences of keys and values are
compressed together. Sync marks should only be placed between blocks. This
will require some changes to MapFile too, so that all file positions stored
there are the positions of blocks, not entries within blocks. Probably this
can be accomplished by adding a getBlockStartPosition() method to
SequenceFile.Writer.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira