[jira] Commented: (HADOOP-6685) Change the generic serialization framework API to use serialization-specific bytes instead of Map for configuration

Doug Cutting (JIRA) Mon, 22 Nov 2010 09:32:42 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12934523#action_12934523
 ]


Doug Cutting commented on HADOOP-6685:
--------------------------------------

> I question the validity of Doug's veto. His objection to the patch has 
> nothing to do with the merits of the patch and everything to do with his wish 
> to push Avro into Hadoop at the cost of the users.

I have withdrawn my hope to add Avro as a data format into Hadoop, since the 
Avro project now already provides a Hadoop data format layered on Hadoop.  To 
my knowledge there are currently no Hadoop issues pushing Avro into Hadoop as a 
data format besides this issue, and I do not currently intend to file any new 
such Hadoop issues.  (Avro's layer would be more easily implemented if the 
shuffle better supported non-Writable data, but, as it stands, is adequate.)

We should refrain from adding any new data formats to the Hadoop kernel.  More 
generally, we should refrain from adding code that could be implemented as user 
code to the kernel.  At present, the kernel must contain some framework code 
that runs in a user's tasks, e.g., sorting code that calls the user's 
comparator.  Beyond that required framework code however, code that runs in 
user tasks should not be provided with the system, but should rather be 
supplied by the user.  User tasks should ideally be able to, e.g., run a 
different version of the HDFS client code.  We have a fair amount of legacy 
code, like SequenceFile, that is currently provided with the system, that we 
cannot immediately remove for compatibility reasons.  But new user-level 
functionality should be provided as external packages, not provided with the 
kernel.  If we wish to enhance the SequenceFile data format, then that should 
be done in a separate project.  The line between user and system code is 
currently blurred and we should work to clarify it and reduce the amount of 
user code in this project, providing a level playing field for user code 
libraries.

> Change the generic serialization framework API to use serialization-specific 
> bytes instead of Map<String,String> for configuration
> ----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-6685
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6685
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>             Fix For: 0.22.0
>
>         Attachments: libthrift.jar, serial.patch, serial4.patch, 
> serial6.patch, serial7.patch, SerializationAtSummit.pdf
>
>
> Currently, the generic serialization framework uses Map<String,String> for 
> the serialization specific configuration. Since this data is really internal 
> to the specific serialization, I think we should change it to be an opaque 
> binary blob. This will simplify the interface for defining specific 
> serializations for different contexts (MAPREDUCE-1462). It will also move us 
> toward having serialized objects for Mappers, Reducers, etc (MAPREDUCE-1183).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6685) Change the generic serialization framework API to use serialization-specific bytes instead of Map for configuration

Reply via email to