[jira] Updated: (HADOOP-1986) Add support for a general serialization mechanism for Map Reduce

Tom White (JIRA) Tue, 26 Feb 2008 10:51:52 -0800

     [ 
https://issues.apache.org/jira/browse/HADOOP-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Tom White updated HADOOP-1986:
------------------------------

    Attachment: serializer-v6.patch

I found an overhead: WritableSerializer was unecessarily wrapping the 
OutputStream. So I've replaced

{code}
  public void open(OutputStream out) {
    dataOut = new DataOutputStream(out);
  }
{code}

with

{code}
  public void open(OutputStream out) {
    if (out instanceof DataOutputStream) {
      dataOut = (DataOutputStream) out;
    } else {
      dataOut = new DataOutputStream(out);
    }
  }
{code}

and similarly for the deserializer.

With this change (see v6 patch) the overhead is almost completely eliminated 
(running on Java6 on Linux):

1867316142 ns - trunk
1931475429 ns - patch v5, 3.4% overhead
1876353143 ns - patch v6, 0.5% overhead

I think this is now ready to be committed. I'll put it in the submit queue.

> Add support for a general serialization mechanism for Map Reduce
> ----------------------------------------------------------------
>
>                 Key: HADOOP-1986
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1986
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: mapred
>            Reporter: Tom White
>            Assignee: Tom White
>             Fix For: 0.17.0
>
>         Attachments: hadoop-serializer-v2.tar.gz, 
> SequenceFileWriterBenchmark.java, SerializableWritable.java, 
> serializer-v1.patch, serializer-v2.patch, serializer-v3.patch, 
> serializer-v4.patch, serializer-v5.patch, serializer-v6.patch
>
>
> Currently Map Reduce programs have to use WritableComparable-Writable 
> key-value pairs. While it's possible to write Writable wrappers for other 
> serialization frameworks (such as Thrift), this is not very convenient: it 
> would be nicer to be able to use arbitrary types directly, without explicit 
> wrapping and unwrapping.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1986) Add support for a general serialization mechanism for Map Reduce

Reply via email to