[ https://issues.apache.org/jira/browse/MAPREDUCE-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12833284#action_12833284 ]
Arun C Murthy commented on MAPREDUCE-326: ----------------------------------------- First up, what is the driver for this proposal? I haven't heard too many asks from 'higher-level' apis, Joydeep mentioned a use-case for which we could provide the framework sort/merge etc. as libraries - that would sufice, no? I completely agree with the need for making some of them available, for e.g. MAPREDUCE-1220 already starts on that path. Anything else? I'd like to understand more before we jump in and put in *lots* of new apis we need to maintain - whether *public* or not. ---- Tom, looking through your proposal there are several issues which I see as inherent with the notion of lower-level 'binary' api as proposed here. For e.g. {code} public void write(K key, V value) throws IOException, InterruptedException { int partition = partitioner.getPartition(key, value, partitions); keySerializer.serialize(key); valSerializer.serialize(value); rawMapOutputCollector.collect(keyBuffer, valueBuffer, partition); } {code} This is completely unworkable, it adds an extra buffer copy from the map's <key, value> into to the sort buffer (io.sort.mb) for the 'high-level' api. This will take us backwards and will undo lots of the work we've done in the past to optimize the Map-Reduce data-path: HADOOP-2919, HADOOP-2095, HADOOP-3365 etc. ---- {quote} This step includes removing the generic types in some framework classes, since they only operate on bytes. For example, MapOutputBuffer's collect() method changes to have a key and value of type DataInputBuffer, which allows it to call the append(DataInputBuffer key, DataInputBuffer value) method on IFile. {quote} I'm not sure I follow, what is the relation between MapOutputBuffer.collect and IFile.append? Are you proposing we do away with the sort? > The lowest level map-reduce APIs should be byte oriented > -------------------------------------------------------- > > Key: MAPREDUCE-326 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-326 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Reporter: eric baldeschwieler > Attachments: MAPREDUCE-326-api.patch, MAPREDUCE-326.pdf > > > As discussed here: > https://issues.apache.org/jira/browse/HADOOP-1986#action_12551237 > The templates, serializers and other complexities that allow map-reduce to > use arbitrary types complicate the design and lead to lots of object creates > and other overhead that a byte oriented design would not suffer. I believe > the lowest level implementation of hadoop map-reduce should have byte string > oriented APIs (for keys and values). This API would be more performant, > simpler and more easily cross language. > The existing API could be maintained as a thin layer on top of the leaner API. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.