[jira] Commented: (HADOOP-1986) Add support for a general serialization mechanism for Map Reduce

Vivek Ratan (JIRA) Wed, 24 Oct 2007 05:04:20 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12537294
 ]


Vivek Ratan commented on HADOOP-1986:
-------------------------------------

>> The factory can keep that around. So, if deserializer depends on the type of 
>> the instance passed in, then the deserializer your factory builds should 
>> include the class and create an instance of it when the instance is null. 
>> Java Serialization would not need to do this, but Thrift would. I'm trying 
>> to avoid client code that differs depending on the serializer.

This might be difficult if you have a serializer that can handle lots of 
classes. Take the example of Record I/O. Every class that can be serialized, 
inherits from Record. There is only one serializer, that for Record I/O, but it 
can handle any Record class (and there're an infinite number of such classes). 
You may want to create a singleton Record I/O serializer to handle more than 
one class that inherits from Record, and it won't know which class to 
deserialize (or, it will have to handle a huge amount of classes). I understand 
that you're trying to avoid extra client code, but you may end up unnecessary 
complicating the platform code. Furthermore, conceptually you do want the 
client to distinguish between serializers that create objects and those that 
expect the client to create them. This is not so relevant in Java, with its 
memory management, but for other languages, you do want to make it explicit as 
to who is responsible for memory management. 

Serializers that create their own objects and pass them back to the client are, 
in many ways, fundamentally different from those that expect clients to pass in 
an object to deserialize. The former expect deserialized objects to have a 
constructor with no parameters, and the objects are quite simple wrappers 
around data. In the latter case, the objects are usually much more than simple 
wrappers around member variables and their constructors can be quite 
complicated. I guess what I'm saying here is that these two types of 
serializers are different enough, and that you will rarely, if ever, see a 
serializer that supports both, that you don't want to hide that difference in 
your common serializer interface. I think a client will either always pass 
objects that it constructs itself, or get back new objects from the serializer; 
I don't think it will mix these calls up with the same serializer. So I think 
it's fine, and desirable, for clients to explicitly make different calls to the 
two types of serializers. In fact, it would seem likely that most clients will 
be written explicitly for one of these two kinds of serializers, given that a 
client will likely use the same platform for serialization and deserialization. 

> Add support for a general serialization mechanism for Map Reduce
> ----------------------------------------------------------------
>
>                 Key: HADOOP-1986
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1986
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: mapred
>            Reporter: Tom White
>            Assignee: Tom White
>             Fix For: 0.16.0
>
>         Attachments: SerializableWritable.java, serializer-v1.patch
>
>
> Currently Map Reduce programs have to use WritableComparable-Writable 
> key-value pairs. While it's possible to write Writable wrappers for other 
> serialization frameworks (such as Thrift), this is not very convenient: it 
> would be nicer to be able to use arbitrary types directly, without explicit 
> wrapping and unwrapping.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1986) Add support for a general serialization mechanism for Map Reduce

Reply via email to