[jira] Commented: (HADOOP-1986) Add support for a general serialization mechanism for Map Reduce

Vivek Ratan (JIRA) Thu, 25 Oct 2007 22:15:14 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12537831
 ]


Vivek Ratan commented on HADOOP-1986:
-------------------------------------

Things get difficult if you want to use a singleton serializer for more than 
one class. In your example, suppose that _RecordSerializer_ is the Record I/O 
serializer, and can serialize any class that derives from _Record_. If I want 
to use Record I/O to serialize all my classes (my key, my value, my 
intermediate key, my intermediate value, etc), then with your scheme, we'd 
create one _RecordSerializer_ object per class that we want to serialize, so 
one for my intermediate map keys, one for my intermediate map values, and so 
on. As we've discussed earlier, serializer objects can contain state (an input 
or output stream, that they keep open across each serialization, for example). 
So having multiple _RecordSerializer_ objects can be a problem, especially if 
more than one serializes to the same stream. It's quite plausible that we may 
want a singleton _RecordSerializer_ object. Well,if it can only store one class 
in its private _recordClass_ variable, then i can't use a singleton object to 
serialize multiple classes. 

All I'm saying is that if we associate one serializer object with one class, we 
lose the ability to share serializer objects across classes, which seems quite 
stifling. And I'm also arguing that we do want clients to explicitly code for 
the two different kinds of serializers so that memory management is clearer, as 
also performance impact (it's good to know who is responsible for creating what 
objects so we can minimize object creation).

> Add support for a general serialization mechanism for Map Reduce
> ----------------------------------------------------------------
>
>                 Key: HADOOP-1986
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1986
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: mapred
>            Reporter: Tom White
>            Assignee: Tom White
>             Fix For: 0.16.0
>
>         Attachments: SerializableWritable.java, serializer-v1.patch
>
>
> Currently Map Reduce programs have to use WritableComparable-Writable 
> key-value pairs. While it's possible to write Writable wrappers for other 
> serialization frameworks (such as Thrift), this is not very convenient: it 
> would be nicer to be able to use arbitrary types directly, without explicit 
> wrapping and unwrapping.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1986) Add support for a general serialization mechanism for Map Reduce

Reply via email to