SequenceFile should compress blocks, not individual entries
-----------------------------------------------------------

         Key: HADOOP-54
         URL: http://issues.apache.org/jira/browse/HADOOP-54
     Project: Hadoop
        Type: Improvement
  Components: io  
    Versions: 0.2    
    Reporter: Doug Cutting
 Assigned to: Michel Tourn 
     Fix For: 0.2


SequenceFile will optionally compress individual values.  But both compression 
and performance would be much better if sequences of keys and values are 
compressed together.  Sync marks should only be placed between blocks.  This 
will require some changes to MapFile too, so that all file positions stored 
there are the positions of blocks, not entries within blocks.  Probably this 
can be accomplished by adding a getBlockStartPosition() method to 
SequenceFile.Writer.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply via email to