[
https://issues.apache.org/jira/browse/HADOOP-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tom White updated HADOOP-1986:
------------------------------
Attachment: serializer-v6.patch
I found an overhead: WritableSerializer was unecessarily wrapping the
OutputStream. So I've replaced
{code}
public void open(OutputStream out) {
dataOut = new DataOutputStream(out);
}
{code}
with
{code}
public void open(OutputStream out) {
if (out instanceof DataOutputStream) {
dataOut = (DataOutputStream) out;
} else {
dataOut = new DataOutputStream(out);
}
}
{code}
and similarly for the deserializer.
With this change (see v6 patch) the overhead is almost completely eliminated
(running on Java6 on Linux):
1867316142 ns - trunk
1931475429 ns - patch v5, 3.4% overhead
1876353143 ns - patch v6, 0.5% overhead
I think this is now ready to be committed. I'll put it in the submit queue.
> Add support for a general serialization mechanism for Map Reduce
> ----------------------------------------------------------------
>
> Key: HADOOP-1986
> URL: https://issues.apache.org/jira/browse/HADOOP-1986
> Project: Hadoop Core
> Issue Type: New Feature
> Components: mapred
> Reporter: Tom White
> Assignee: Tom White
> Fix For: 0.17.0
>
> Attachments: hadoop-serializer-v2.tar.gz,
> SequenceFileWriterBenchmark.java, SerializableWritable.java,
> serializer-v1.patch, serializer-v2.patch, serializer-v3.patch,
> serializer-v4.patch, serializer-v5.patch, serializer-v6.patch
>
>
> Currently Map Reduce programs have to use WritableComparable-Writable
> key-value pairs. While it's possible to write Writable wrappers for other
> serialization frameworks (such as Thrift), this is not very convenient: it
> would be nicer to be able to use arbitrary types directly, without explicit
> wrapping and unwrapping.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.