[
https://issues.apache.org/jira/browse/AVRO-493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881606#action_12881606
]
Iván de Prado commented on AVRO-493:
------------------------------------
> Is that because of AVRO-534, or something else?
No. The problem is that the Configuration is not available at
DeserializerComparator, so it is not possible to instantiate a
SerializationFactory with "new SerializationFactory(jobConf)".
>> Otherwise you would need a different Avro schema with a different sorting
>> for each kind of grouping you want to do in the reducer.
> Is that so bad?
Consider the following user case: you manage a data file with people Profiles.
Each Profile has many fields. Common operations could be grouping by address,
grouping by name and other combinations. Maintaining several Avro schema files
for each needed sorting is not reasonable (it also would include several
specific Java classes).
> Would AVRO-581 help?
Well, I couldn't tell you. It would depend on the implementation details. It
will be useful if the Avro Hadoop's implementation supports <key, value>,
including ways for providing your own partitioner, group comparator and key
comparator.
> hadoop mapreduce support for avro data
> --------------------------------------
>
> Key: AVRO-493
> URL: https://issues.apache.org/jira/browse/AVRO-493
> Project: Avro
> Issue Type: New Feature
> Components: java
> Reporter: Doug Cutting
> Assignee: Doug Cutting
> Fix For: 1.4.0
>
> Attachments: AVRO-493.patch, AVRO-493.patch
>
>
> Avro should provide support for using Hadoop MapReduce over Avro data files.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.