[jira] Commented: (HADOOP-1986) Add support for a general serialization mechanism for Map Reduce

Vivek Ratan (JIRA) Tue, 30 Oct 2007 04:46:12 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12538761
 ]


Vivek Ratan commented on HADOOP-1986:
-------------------------------------

>> Yes, that's more-or-less assumed, but I yet fail to see it as a problem. All 
>> record classes are generated from an IDL and it should be easy to generate a 
>> no-arg ctor for those. Ditto for thrift. Things that implement Writable 
>> today already must have a no-arg ctor. Can you please provide a more 
>> detailed example of something that would prove difficult and why it is 
>> important that it be easy?

I guess I'm doing a poor job explaining my point because the larger issue seems 
to be missed. Given that there are two kinds of deserializers, those that 
create objects and those that take in object references, we're discussion what 
the deserialization interface should look like to handle both these kinds of 
deserializers, right? We seem to have two choices: have the client figure out 
which kind of deserializer it is interacting with and have it call the right 
deserialize method, or have a single deserialize method and let the 
deserializer create an object where necessary or use one provided by the 
client. Right? Doug provided an example of the latter, and I thought there were 
some issues. The biggest one is this: for a deserializer to create an object, 
it needs to know the type of the object that is being deserialized. Some 
deserializers (such as Java's serialization, and Writables, I think) know this, 
because the class name is part of the serialized data. Others, such as Thrift 
or Record I/O, do not serialize the class name (I'm pretty sure about Record 
I/O, and the Thrift code I saw sometime back didn't serialize class names, as 
far as I can remember), so they do not know which object to create. Doug 
suggested that the serializer store the class name, when it is created by the 
class factory. I said that wouldn't work because you will likely want a 
singleton deserializer object to handle deserializing more than one class so 
you cannot link a deserializer object with only one class. What this means is 
that, IMO, you cannot get Thrift or Record I/O deserializers to create objects, 
the way they work today, and for them, the client has to pass in an object. 
similarly, there can be other serializers that always create an object, and 
they cannot use one passed in by the client. That is the crux of my argument. 

I also mentioned that for a serializer to create its own objects, it requires 
all deserialized objects to support constructors with no args. yes, Thrift and 
record I/O and Writables do so, but if we want to support various other kinds 
of serialization platforms, we're forcing every supported platform to use 
no-arg constructors. This seems like an unnecessary restriction to me. I don't 
have an example of a deserializer where this would be an issue, but I can 
easily  imagine situations where you have objects  without no-arg constructors 
(there are lots of objects that we design where we don't want no-arg 
constructors) which you want to deserialize. Anyways, this is a minor point, 
and mostly theoretical (though valid, IMO). But it adds to my argument that you 
want to have separate deserialize methods and let the client call the right 
one.  (There is also my argument that it's good design to have separate methods 
to make memory management explicit, especially for languages like C++, but I 
admit it's not a strong argument if we're only looking at Java). 

Again, my point is that deserializers for Thrift and Record I/O cannot create 
objects themselves and will always require the client to pass in the object (or 
invoke the deserialize method on a known object), so they, or a layer around 
them,  cannot support a single deserialize method that can optionally take in 
an object from a client or create one of its own, at least not without a lot of 
pain. 


> Add support for a general serialization mechanism for Map Reduce
> ----------------------------------------------------------------
>
>                 Key: HADOOP-1986
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1986
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: mapred
>            Reporter: Tom White
>            Assignee: Tom White
>             Fix For: 0.16.0
>
>         Attachments: SerializableWritable.java, serializer-v1.patch
>
>
> Currently Map Reduce programs have to use WritableComparable-Writable 
> key-value pairs. While it's possible to write Writable wrappers for other 
> serialization frameworks (such as Thrift), this is not very convenient: it 
> would be nicer to be able to use arbitrary types directly, without explicit 
> wrapping and unwrapping.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1986) Add support for a general serialization mechanism for Map Reduce

Reply via email to