[jira] Commented: (HADOOP-1986) Add support for a general serialization mechanism for Map Reduce

Doug Cutting (JIRA) Wed, 31 Oct 2007 10:59:14 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539156
 ]


Doug Cutting commented on HADOOP-1986:
--------------------------------------

> serializer instances can have state (an input or output stream, that they 
> keep open across each serialization, for example)

Ah, stateful serializers again.  Above we agreed that "stateful" serializers 
could not buffer, since we might wish to put raw binary values between 
serialized objects (as SequenceFile does).  Do you dispute that?  If not, then 
I don't see how per-class serializer instances are a problem.  In the case of 
Writables, the serializer's "state" would just be a DataOutputStream whose 
output could be re-directed.  We also need to permit seeks to the position 
where an object was written, no?  So unless we permit serializers to buffer, I 
still don't see what problematic state a serializer can have.

Also, don't we permit mixing of serializers in a file?  Couldn't one have, 
e.g., a Record i/o-defined key and a Thrift-defined value?  Unless we prohibit 
that, clients cannot reliably share serializers.

Note that, with these restrictions, using something like Java Serialization for 
small objects will be very expensive.  But these shortcomings of Java 
Serialization are the reason we're not using Java Serialization, so such pain 
is to be expected.

> it's not great, but it's not so bad either

It is bad.  Client code should not have to replicate logic.  The framework 
should encapsulate it.  That's a requirement.

> Well, yes for Thrift and Record I/O but maybe not so for some other platform 
> we may want to support in the future [...]

Tell me more about this supposed platform, how it works, how it constructs 
instances, etc.  I'm having a hard time imagining one that cannot fit within 
the proposed framework.

> Add support for a general serialization mechanism for Map Reduce
> ----------------------------------------------------------------
>
>                 Key: HADOOP-1986
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1986
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: mapred
>            Reporter: Tom White
>            Assignee: Tom White
>             Fix For: 0.16.0
>
>         Attachments: SerializableWritable.java, serializer-v1.patch
>
>
> Currently Map Reduce programs have to use WritableComparable-Writable 
> key-value pairs. While it's possible to write Writable wrappers for other 
> serialization frameworks (such as Thrift), this is not very convenient: it 
> would be nicer to be able to use arbitrary types directly, without explicit 
> wrapping and unwrapping.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1986) Add support for a general serialization mechanism for Map Reduce

Reply via email to