[
https://issues.apache.org/jira/browse/HADOOP-3460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12602446#action_12602446
]
Chris Douglas commented on HADOOP-3460:
---------------------------------------
This looks great; just a few points:
* Different properties for the output key/value classes aren't necessary; you
can use the existing methods, like JobConf::getOutputKeyClass.
* The generic signature on the RecordWriter can be
<BytesWritable,BytesWritable> if the signature on SeqFileOF were correct:
{noformat}
-public class SequenceFileOutputFormat
-extends FileOutputFormat<WritableComparable, Writable> {
+public class SequenceFileOutputFormat<K extends WritableComparable,
+ V extends Writable>
+ extends FileOutputFormat<K,V> {
{noformat}
Permitting SeqFABOF:
{noformat}
public class SequenceFileAsBinaryOutputFormat
extends SequenceFileOutputFormat<BytesWritable,BytesWritable> {
{noformat}
This generates a warning in MultipleSequenceFileOutputFormat, but it's spurious
and can be suppressed.
* Since record compression is not supported, it might be worthwhile to override
OutputFormat::checkOutputSpecs and throw if it's attempted
* This should be in o.a.h.mapred.lib rather than o.a.h.mapred
* Keeping a WritableValueBytes instance around (and adding a reset method)
might be useful, so a new one isn't created for each write.
* The IllegalArgumentException in WritableValueBytes should probably be an
UnsupportedOperationException
* WritableValueBytes should be a _static_ inner class
* The indentation on the anonymous RecordWriter::close should be consistent
with the standards
> SequenceFileAsBinaryOutputFormat
> --------------------------------
>
> Key: HADOOP-3460
> URL: https://issues.apache.org/jira/browse/HADOOP-3460
> Project: Hadoop Core
> Issue Type: New Feature
> Components: mapred
> Reporter: Koji Noguchi
> Priority: Minor
> Attachments: HADOOP-3460-part1.patch
>
>
> Add an OutputFormat to write raw bytes as keys and values to a SequenceFile.
> In C++-Pipes, we're using SequenceFileAsBinaryInputFormat to read
> Sequencefiles.
> However, we current don't have a way to *write* a sequencefile efficiently
> without going through extra (de)serializations.
> I'd like to store the correct classnames for key/values but use BytesWritable
> to write
> (in order for the next java or pig code to be able to read this sequencefile).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.