[
http://issues.apache.org/jira/browse/HADOOP-54?page=comments#action_12423530 ]
Owen O'Malley commented on HADOOP-54:
-------------------------------------
Ok, after talking it over with Eric, here is what is hopefully a last pass at
this.
All in SequenceFile:
public static class Writer {
... current stuff ...
/**
* Append an uncompressed representation of the key and a raw representation
of the value as the
* next record.
*/
public void appendRaw(byte[] key, int keyOffset, int keyLength, RawValue
value);
}
public static class RecordCompressWriter extends Writer {
... constructor and some overriding methods ...
}
public static class BlockCompressWriter extends Writer {
... constructor and some overriding methods ...
}
public static class Reader {
... current stuff ...
/**
* Read the next key into the key buffer and return the value as a RawValue.
* @param key a buffer to store the uncompressed serialized key in as a
sequence of bytes
* @returns NULL if there are no more key/value pairs in the file
*/
public RawValue readRaw(DataOutputStream key);
}
public static interface RawValue {
// writes the uncompressed bytes to the outStream
public void writeUncompressedBytes(DataOutputStream outStream);
// is this raw value compressed (using zip)?
public boolean canWriteCompressed();
// write the (zip) compressed bytes. note that it will NOT compress the
bytes if they are not
// already compressed
// throws IllegalArgumentException if the value is not already compressed
public void writeCompressedBytes(DataOutputStream outStream);
// when we add custom compressors, we would add:
public boolean canWriteCompressed(Class compressorClass);
public void writeCompressedBytes(Class compressorClass, DataOutputStream
outStream);
}
> SequenceFile should compress blocks, not individual entries
> -----------------------------------------------------------
>
> Key: HADOOP-54
> URL: http://issues.apache.org/jira/browse/HADOOP-54
> Project: Hadoop
> Issue Type: Improvement
> Components: io
> Affects Versions: 0.2.0
> Reporter: Doug Cutting
> Assigned To: Arun C Murthy
> Fix For: 0.5.0
>
> Attachments: VIntCompressionResults.txt
>
>
> SequenceFile will optionally compress individual values. But both
> compression and performance would be much better if sequences of keys and
> values are compressed together. Sync marks should only be placed between
> blocks. This will require some changes to MapFile too, so that all file
> positions stored there are the positions of blocks, not entries within
> blocks. Probably this can be accomplished by adding a
> getBlockStartPosition() method to SequenceFile.Writer.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira