[ https://issues.apache.org/jira/browse/HADOOP-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12540333 ]
Vivek Ratan commented on HADOOP-1986: ------------------------------------- >>Above we agreed that "stateful" serializers could not buffer, since we might >>wish to put raw binary values between serialized objects (as SequenceFile >>does). Do you dispute that? No. I agree hat serializers should not buffer. But serializer instances can share output streams or other objects, and that's what I meant by 'state'. It seems to me that what you're saying is that if you want a serialization platform X to work with Hadoop, X should do two things, at least: - X should allow creation of multiple instances of its serializer (so, for example, if X's serializer instances share anything, like library handles or stream objects, they have to deal any issues that arise from this sharing, such as initializing or destroying these shared instances etc; i.e., X is responsible for all this) - X needs to be able to *both* create objects before deserializing them (i.e., those objects should have no-arg constructors, or should be all constructed in a common manner) *and* take in a reference to an object and initialize its member variables with deserialized data. If X follows these, then we get client code that is generic and does not have to 'replicate logic', as you say. Correct? I'm all in favor of client code not replicating framework logic. It's definitely an important requirement. But I see it as coming with a price: the two constraints above that X must follow. Now, Thrift or Record I/O shouldn't have any problems with these constraints, which is quite important to know. But the constraints are non-trivial enough that some other platform might not be able to satisfy them. Unfortunately, I do not have a concrete example of such a platform. At the same time, I can realistically imagine a platform that does not force its de/serializable objects to have no-arg constructors (because that can be a severe restriction in the design of an object), and requires the caller to pass in an object reference (much like Java Serialization, but without the Java platform having to create the objects when deserializing). But yes, these are somewhat hypothetical arguments. I also understand that we should perhaps favor design that supports existing serialization platforms and not make it too general if there's a price. At this point, I think it's a gut call. If we feel that having clients not replicate platform logic is more important than the restrictions we're providing on serialization platforms, that's fine. I can certainly see the validity of that, and can't argue strongly against it. I lean (slightly) more towards the other side, but I don't have concrete examples to lean too far. > Add support for a general serialization mechanism for Map Reduce > ---------------------------------------------------------------- > > Key: HADOOP-1986 > URL: https://issues.apache.org/jira/browse/HADOOP-1986 > Project: Hadoop > Issue Type: New Feature > Components: mapred > Reporter: Tom White > Assignee: Tom White > Fix For: 0.16.0 > > Attachments: SerializableWritable.java, serializer-v1.patch > > > Currently Map Reduce programs have to use WritableComparable-Writable > key-value pairs. While it's possible to write Writable wrappers for other > serialization frameworks (such as Thrift), this is not very convenient: it > would be nicer to be able to use arbitrary types directly, without explicit > wrapping and unwrapping. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.