[ https://issues.apache.org/jira/browse/MAPREDUCE-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12829251#action_12829251 ]
Jay Booth commented on MAPREDUCE-1126: -------------------------------------- +1 for the general concept of a lower-level API, great idea Any thoughts regarding explicitly setting a Mapper per Split? Joins between different formats are a pretty primary use case, and it's always awkward using MultipleInputs to shoehorn the different classes into a single conf.. as I understand it now, with MultipleInputs, the MapTask wakes up, looks at its input split, compares that to a magic configuration field mapping splits to mapper classes, and instantiates that mapper class. Which leads to trouble if you need to mix it with, say, CombineFileInputFormat or anything else that relies on configuration, since the different static setConfigValue(conf) methods set a single value assuming a single mapper class. If we set a specific mapper class per split, and then a specific config per mapper class, I think it would be a lot more flexible to shoehorn different types of functionality if you're a framework author -- if you're just a user, maybe you don't want to deal with the extra environment setup for simple jobs but if this is a lower level API, maybe it could be useful? It would certainly be cleaner if a single-input job is just a N=1 multiple inputs job, rather than the current situation where a multiple inputs job is a configuration-level hack on top of the single-input framework. > shuffle should use serialization to get comparator > -------------------------------------------------- > > Key: MAPREDUCE-1126 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1126 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: task > Reporter: Doug Cutting > Assignee: Aaron Kimball > Fix For: 0.22.0 > > Attachments: MAPREDUCE-1126.2.patch, MAPREDUCE-1126.3.patch, > MAPREDUCE-1126.4.patch, MAPREDUCE-1126.5.patch, MAPREDUCE-1126.6.patch, > MAPREDUCE-1126.patch, MAPREDUCE-1126.patch > > > Currently the key comparator is defined as a Java class. Instead we should > use the Serialization API to create key comparators. This would permit, > e.g., Avro-based comparators to be used, permitting efficient sorting of > complex data types without having to write a RawComparator in Java. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.