[jira] Updated: (HADOOP-3460) SequenceFileAsBinaryOutputFormat

Koji Noguchi (JIRA) Fri, 06 Jun 2008 10:22:06 -0700

     [ 
https://issues.apache.org/jira/browse/HADOOP-3460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Koji Noguchi updated HADOOP-3460:
---------------------------------

    Attachment: HADOOP-3460-part3.patch

 bq.  1.  The testcase: doesn't need a main method, you might want to break up 
the check for forbidding record compression into a separate test, 

Separted the test into three. testbinary, 
testSequenceOutputClassDefaultsToMapRedOutputClass, and 
testcheckOutputSpecsForbidRecordCompression.

Also, I had a bug in the testing such that  checkOutputSpecs was throwing an 
exception because output path was not set and not because RECORD compression 
was being set.
Fixed it.

bq. and the call to JobConf::setInputPath is generating a warning (replace with 
FileInputFormat::addInputPath)

Ah. I should have compiled with "-Djavac.args="-Xlint -Xmaxwarns 1000".  
Done.


bq.   2. WritableValueBytes::writeCompressedBytes no longer throws 
IllegalArgumentException, so that can be removed from its signature

I left it in since the original SequenceFile.ValueBytes has a signature 
{noformat} 
    public void writeCompressedBytes(DataOutputStream outStream) 
      throws IllegalArgumentException, IOException;
{noformat} 
Should I still take it out?

bq.   3. SeqFABOF::checkOutputSpecs doesn't need to list InvalidJobConfException

Done.

> SequenceFileAsBinaryOutputFormat
> --------------------------------
>
>                 Key: HADOOP-3460
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3460
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: mapred
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Minor
>         Attachments: HADOOP-3460-part1.patch, HADOOP-3460-part2.patch, 
> HADOOP-3460-part3.patch
>
>
> Add an OutputFormat to write raw bytes as keys and values to a SequenceFile.
> In C++-Pipes, we're using SequenceFileAsBinaryInputFormat to read 
> Sequencefiles.
> However, we current don't have a way to *write* a sequencefile efficiently 
> without going through extra (de)serializations.
> I'd like to store the correct classnames for key/values but use BytesWritable 
> to write
> (in order for the next java or pig code to be able to read this sequencefile).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-3460) SequenceFileAsBinaryOutputFormat

Reply via email to