[jira] Commented: (HADOOP-1986) Add support for a general serialization mechanism for Map Reduce

Vivek Ratan (JIRA) Mon, 05 Nov 2007 21:27:14 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12540333
 ]


Vivek Ratan commented on HADOOP-1986:
-------------------------------------

>>Above we agreed that "stateful" serializers could not buffer, since we might 
>>wish to put raw binary values between serialized objects (as SequenceFile 
>>does). Do you dispute that?
No. I agree hat serializers should not buffer. But serializer instances can 
share output streams or other objects, and that's what I meant by 'state'. 

It seems to me that what you're saying is that if you want a serialization 
platform X to work with Hadoop, X should do two things, at least:
- X should allow creation of multiple instances of its serializer (so, for 
example, if X's serializer instances share anything, like library handles or 
stream objects, they have to deal any issues that arise from this sharing, such 
as initializing or destroying these shared instances etc; i.e., X is 
responsible for all this)
- X needs to be able to *both* create objects before deserializing them (i.e., 
those objects should have no-arg constructors, or should be all constructed in 
a common manner) *and* take in a reference to an object and initialize its 
member variables with deserialized data. 
If X follows these, then we get client code that is generic and does not have 
to 'replicate logic', as you say. Correct? 

I'm all in favor of client code not replicating framework logic. It's 
definitely an important requirement. But I see it as coming with a price: the 
two constraints above that X must follow. Now, Thrift or Record I/O shouldn't 
have any problems with these constraints, which is quite important to know. But 
the constraints are non-trivial enough that some other platform might not be 
able to satisfy them. Unfortunately, I do not have a concrete example of such a 
platform. At the same time, I can realistically imagine a platform that does 
not force its de/serializable objects to have no-arg constructors (because that 
can be a severe restriction in the design of an object), and requires the 
caller to pass in an object reference (much like Java Serialization, but 
without the Java platform having to create the objects when deserializing). But 
yes, these are somewhat hypothetical arguments. I also understand that we 
should perhaps favor design that supports existing serialization platforms and 
not make it too general if there's a price. 

At this point, I think it's a gut call. If we feel that having clients not 
replicate platform logic is more important than the restrictions we're 
providing on serialization platforms, that's fine. I can certainly see the 
validity of that, and can't argue strongly against it. I lean (slightly) more 
towards the other side, but I don't have concrete examples to lean too far. 

> Add support for a general serialization mechanism for Map Reduce
> ----------------------------------------------------------------
>
>                 Key: HADOOP-1986
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1986
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: mapred
>            Reporter: Tom White
>            Assignee: Tom White
>             Fix For: 0.16.0
>
>         Attachments: SerializableWritable.java, serializer-v1.patch
>
>
> Currently Map Reduce programs have to use WritableComparable-Writable 
> key-value pairs. While it's possible to write Writable wrappers for other 
> serialization frameworks (such as Thrift), this is not very convenient: it 
> would be nicer to be able to use arbitrary types directly, without explicit 
> wrapping and unwrapping.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1986) Add support for a general serialization mechanism for Map Reduce

Reply via email to