[
https://issues.apache.org/jira/browse/HADOOP-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12537294
]
Vivek Ratan commented on HADOOP-1986:
-------------------------------------
>> The factory can keep that around. So, if deserializer depends on the type of
>> the instance passed in, then the deserializer your factory builds should
>> include the class and create an instance of it when the instance is null.
>> Java Serialization would not need to do this, but Thrift would. I'm trying
>> to avoid client code that differs depending on the serializer.
This might be difficult if you have a serializer that can handle lots of
classes. Take the example of Record I/O. Every class that can be serialized,
inherits from Record. There is only one serializer, that for Record I/O, but it
can handle any Record class (and there're an infinite number of such classes).
You may want to create a singleton Record I/O serializer to handle more than
one class that inherits from Record, and it won't know which class to
deserialize (or, it will have to handle a huge amount of classes). I understand
that you're trying to avoid extra client code, but you may end up unnecessary
complicating the platform code. Furthermore, conceptually you do want the
client to distinguish between serializers that create objects and those that
expect the client to create them. This is not so relevant in Java, with its
memory management, but for other languages, you do want to make it explicit as
to who is responsible for memory management.
Serializers that create their own objects and pass them back to the client are,
in many ways, fundamentally different from those that expect clients to pass in
an object to deserialize. The former expect deserialized objects to have a
constructor with no parameters, and the objects are quite simple wrappers
around data. In the latter case, the objects are usually much more than simple
wrappers around member variables and their constructors can be quite
complicated. I guess what I'm saying here is that these two types of
serializers are different enough, and that you will rarely, if ever, see a
serializer that supports both, that you don't want to hide that difference in
your common serializer interface. I think a client will either always pass
objects that it constructs itself, or get back new objects from the serializer;
I don't think it will mix these calls up with the same serializer. So I think
it's fine, and desirable, for clients to explicitly make different calls to the
two types of serializers. In fact, it would seem likely that most clients will
be written explicitly for one of these two kinds of serializers, given that a
client will likely use the same platform for serialization and deserialization.
> Add support for a general serialization mechanism for Map Reduce
> ----------------------------------------------------------------
>
> Key: HADOOP-1986
> URL: https://issues.apache.org/jira/browse/HADOOP-1986
> Project: Hadoop
> Issue Type: New Feature
> Components: mapred
> Reporter: Tom White
> Assignee: Tom White
> Fix For: 0.16.0
>
> Attachments: SerializableWritable.java, serializer-v1.patch
>
>
> Currently Map Reduce programs have to use WritableComparable-Writable
> key-value pairs. While it's possible to write Writable wrappers for other
> serialization frameworks (such as Thrift), this is not very convenient: it
> would be nicer to be able to use arbitrary types directly, without explicit
> wrapping and unwrapping.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.