[ 
https://issues.apache.org/jira/browse/HADOOP-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12532998
 ] 

Tom White commented on HADOOP-1986:
-----------------------------------

Vivek,

> I'm thinking about serialization not just for key-value pairs for Map/Reduce, 
> but also in other places

I agree that it would be useful to have a common serialization mechanism for 
all parts of Hadoop. The serialization mechanism proposed so far is likely to 
be applicable more widely since it so general - it talks in terms of 
input/output streams and parameterized types. 

This Jira issue is confined to the MapReduce part, since we have to start 
somewhere. I think it would be a useful exercise to think through the 
implications of the design for other parts of Hadoop before committing any 
changes though.

> I don't think you want a serializer/deserializer per class. 

Not per concrete class, agreed. But per base class (e.g. Writable, 
Serializable, Thriftable, etc).

> Someone still needs to implement the code for serializing/deserializing that 
> class and I don't see any
> discussion on Hadoop support for Thrift or Record which the user can just 
> invoke. plus, if you think of
> using this mechanism for Hadoop RPC, we will have so many instances of the 
> Serializer<T> interface. You're
> far better off having a HadoopSerializer class that takes in any object and 
> automatically
> serializes/deserializes it. All a user has to do is decide which 
> serialization platform to use.

I think you pretty much describe where I would like to get to. If people are 
using Thrift for example (and there is a common Thrift interface) then there 
would be a ThriftSerializer that would just work for people, with little or no 
configuration. While it should still be relatively easy to write a custom 
serializer/deserializer, most people will use the standard ones for the 
standard serializing platforms.

There is a question about where these serializers would go - e.g. would 
ThriftSerializer go in core Hadoop?


> Add support for a general serialization mechanism for Map Reduce
> ----------------------------------------------------------------
>
>                 Key: HADOOP-1986
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1986
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: mapred
>            Reporter: Tom White
>             Fix For: 0.16.0
>
>         Attachments: SerializableWritable.java
>
>
> Currently Map Reduce programs have to use WritableComparable-Writable 
> key-value pairs. While it's possible to write Writable wrappers for other 
> serialization frameworks (such as Thrift), this is not very convenient: it 
> would be nicer to be able to use arbitrary types directly, without explicit 
> wrapping and unwrapping.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to