[
https://issues.apache.org/jira/browse/HADOOP-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12933627#action_12933627
]
Doug Cutting commented on HADOOP-6685:
--------------------------------------
Eli, thanks for refocusing discussion on the original objective of this issue.
We've gotten distracted by other aspects not essential to that, and I don't
think that primary objective has ever been agreed on.
I do not agree that an array of bytes is a better way to represent
serialization metadata. (I stated this in the first comment on this issue.) I
prefer the solutions that were in HADOOP-6165 and HADOOP-6420. My objections
are:
- inheritance is used in serialization implementations, and inheritance is
harder to implement with binary objects
- binary encodings are less transparent and create binary serialization
bootstrap problems
- serialization metadata is not large nor read/written in inner loops, so
binary is not required
- using a binary encoding for serialization metadata will require substantial
changes to serialization clients. The Map<String,String> approach is easily
embedded in existing metadata, like configurations, jobs, sequencefile, etc., a
binary encoding requires changes to all serialization clients, with little to
compensate. changes to public apis and persistent data formats should be made
only when there is clear end-user value, which i don't see that a change from
Map<String,String> to byte[] provides.
I also will re-voice my objection that the current patch makes a large number
of changes beyond changing the format of serialization metadata. We should
restrict the patch to the description, and change other things in other issues.
> Change the generic serialization framework API to use serialization-specific
> bytes instead of Map<String,String> for configuration
> ----------------------------------------------------------------------------------------------------------------------------------
>
> Key: HADOOP-6685
> URL: https://issues.apache.org/jira/browse/HADOOP-6685
> Project: Hadoop Common
> Issue Type: Improvement
> Reporter: Owen O'Malley
> Assignee: Owen O'Malley
> Fix For: 0.22.0
>
> Attachments: libthrift.jar, serial.patch, serial4.patch,
> serial6.patch, SerializationAtSummit.pdf
>
>
> Currently, the generic serialization framework uses Map<String,String> for
> the serialization specific configuration. Since this data is really internal
> to the specific serialization, I think we should change it to be an opaque
> binary blob. This will simplify the interface for defining specific
> serializations for different contexts (MAPREDUCE-1462). It will also move us
> toward having serialized objects for Mappers, Reducers, etc (MAPREDUCE-1183).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.