I'm happy with where this is going.
I'm wondering if we can not defined writeCompressed() to just do the
right thing, rather then require the checks. Since the right thing
is well defined, if the "rawValue" is not correctly compressed, that
can be done on the call, yes?
(I'm happy either way, just think about this as you code it, ok?)
On Jul 26, 2006, at 12:26 AM, Doug Cutting (JIRA) wrote:
[ http://issues.apache.org/jira/browse/HADOOP-54?
page=comments#action_12423560 ]
Doug Cutting commented on HADOOP-54:
------------------------------------
I mostly have minor naming quibbles.
The Writer method should be named just 'next', not 'nextRaw'.
The new Writer subclasses should not be public, but rather should
be created by a factory method.
The RawValue class might better be named 'ValueBytes', and it's
methods can simply be called writeCompressed(), writeUncompressed
(), etc.
Finally, a substantive remark: we should not allocate a new
RawValue for each key read. So the new Reader methods should be:
public ValueBytes createValueBytes();
public void next(DataOutputStream key, ValueBytes value);
SequenceFile should compress blocks, not individual entries
-----------------------------------------------------------
Key: HADOOP-54
URL: http://issues.apache.org/jira/browse/HADOOP-54
Project: Hadoop
Issue Type: Improvement
Components: io
Affects Versions: 0.2.0
Reporter: Doug Cutting
Assigned To: Arun C Murthy
Fix For: 0.5.0
Attachments: VIntCompressionResults.txt
SequenceFile will optionally compress individual values. But both
compression and performance would be much better if sequences of
keys and values are compressed together. Sync marks should only
be placed between blocks. This will require some changes to
MapFile too, so that all file positions stored there are the
positions of blocks, not entries within blocks. Probably this can
be accomplished by adding a getBlockStartPosition() method to
SequenceFile.Writer.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the
administrators: http://issues.apache.org/jira/secure/
Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/
software/jira