[
https://issues.apache.org/jira/browse/HADOOP-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12931588#action_12931588
]
Owen O'Malley commented on HADOOP-6685:
---------------------------------------
{quote}
Owen, thanks for the slides
{quote}
You're welcome. Everyone had seen them before, but I wanted to make sure they
were easily available for this conversation.
{quote}
I don't see a direct relation between this issue and the issue of simplifying
the implementation of efficient map-side joins (MAPREDUCE-1183, more or less).
Am I missing the connection, or is this a distinct issue?
{quote}
It is related because we want to support context-specific serializations. That
support is much easier if the metadata for each serialization is in a separate
structure and not dumped into the Configuration. This is the same problem that
comes from MAPREDUCE-1183 for InputFormats, Mappers, etc. They are similar
issues and it would be nice to have a consistent solutions.
{quote}
File formats are forever.
{quote}
I'm adding no new file formats. I'm just making the ones that we've had for
years have more useful.
{quote}
We badly need to add support for a higher-level object serialization system
than Writable.
{quote}
I obviously agree enough that I'm working on supporting it. Providing customer
choice over the serialization is much richer than forcing them into a single
one. They each have different design decisions, by making the choice pluggable
the *user* can decide. I understand that you want Avro everywhere. Other users
have other priorities.
{quote}
But I'm not convinced its wise to add such support to the exisiting Java-only
container file formats.
{quote}
I'm supporting the containers we have. I'd love for someone to implement
SequenceFiles or TFiles in C. That is an orthogonal issue. Any file format that
only supports one serialization doesn't meet my needs.
This change should have no impact on any current applications. Very few of them
depend on the serialization library directly. My hope is that by extending the
library and supporting a wider range of serializations, users will be able to
code their applications using the types that *they* find convenient.
> Change the generic serialization framework API to use serialization-specific
> bytes instead of Map<String,String> for configuration
> ----------------------------------------------------------------------------------------------------------------------------------
>
> Key: HADOOP-6685
> URL: https://issues.apache.org/jira/browse/HADOOP-6685
> Project: Hadoop Common
> Issue Type: Improvement
> Reporter: Owen O'Malley
> Assignee: Owen O'Malley
> Attachments: serial.patch, SerializationAtSummit.pdf
>
>
> Currently, the generic serialization framework uses Map<String,String> for
> the serialization specific configuration. Since this data is really internal
> to the specific serialization, I think we should change it to be an opaque
> binary blob. This will simplify the interface for defining specific
> serializations for different contexts (MAPREDUCE-1462). It will also move us
> toward having serialized objects for Mappers, Reducers, etc (MAPREDUCE-1183).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.