[jira] Commented: (HADOOP-3460) SequenceFileAsBinaryOutputFormat

Chris Douglas (JIRA) Wed, 04 Jun 2008 13:25:10 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-3460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12602446#action_12602446
 ]


Chris Douglas commented on HADOOP-3460:
---------------------------------------

This looks great; just a few points:

* Different properties for the output key/value classes aren't necessary; you 
can use the existing methods, like JobConf::getOutputKeyClass.
* The generic signature on the RecordWriter can be 
<BytesWritable,BytesWritable> if the signature on SeqFileOF were correct:
{noformat}
-public class SequenceFileOutputFormat
-extends FileOutputFormat<WritableComparable, Writable> {
+public class SequenceFileOutputFormat<K extends WritableComparable,
+                                      V extends Writable>
+    extends FileOutputFormat<K,V> {
{noformat}
Permitting SeqFABOF:
{noformat}
public class SequenceFileAsBinaryOutputFormat
    extends SequenceFileOutputFormat<BytesWritable,BytesWritable> {
{noformat}
This generates a warning in MultipleSequenceFileOutputFormat, but it's spurious 
and can be suppressed.
* Since record compression is not supported, it might be worthwhile to override 
OutputFormat::checkOutputSpecs and throw if it's attempted
* This should be in o.a.h.mapred.lib rather than o.a.h.mapred
* Keeping a WritableValueBytes instance around (and adding a reset method) 
might be useful, so a new one isn't created for each write.
* The IllegalArgumentException in WritableValueBytes should probably be an 
UnsupportedOperationException
* WritableValueBytes should be a _static_ inner class
* The indentation on the anonymous RecordWriter::close should be consistent 
with the standards

> SequenceFileAsBinaryOutputFormat
> --------------------------------
>
>                 Key: HADOOP-3460
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3460
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: mapred
>            Reporter: Koji Noguchi
>            Priority: Minor
>         Attachments: HADOOP-3460-part1.patch
>
>
> Add an OutputFormat to write raw bytes as keys and values to a SequenceFile.
> In C++-Pipes, we're using SequenceFileAsBinaryInputFormat to read 
> Sequencefiles.
> However, we current don't have a way to *write* a sequencefile efficiently 
> without going through extra (de)serializations.
> I'd like to store the correct classnames for key/values but use BytesWritable 
> to write
> (in order for the next java or pig code to be able to read this sequencefile).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3460) SequenceFileAsBinaryOutputFormat

Reply via email to