[jira] Commented: (MAPREDUCE-326) The lowest level map-reduce APIs should be byte oriented

Jay Booth (JIRA) Thu, 04 Feb 2010 13:05:54 -0800

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12829779#action_12829779
 ]


Jay Booth commented on MAPREDUCE-326:
-------------------------------------

This sounds awesome.

Is this roughly the workflow you're envisioning?

1)  kernel starts map process, sends Split information via stdin
2)  framework reads in Split info, uses that to instantiate InputFormat and 
userland Mapper class, runs Map with output going to stdout
3)  Kernel sends output to different partitions
4)  Kernel executes shuffle, framework/kernel does sort (TBD?  Maybe Kernel 
defaults to byte[] comparison but allows Framework to override?)
5)  Kernel starts reduce process, framework reads some sort of ReduceSplit with 
partition info, creates userland Reducer
6)  Framework executes userland Reducer, pipes output through kernel to reduce 
output location

Is that more or less accurate?  I think it'd be awesome, being able to run 
tasks in different languages is going to become more and more important..  
JRuby and Clojure are good to go right now as far as a DFSClient, and other 
languages are doable via dfs -cat, as Doug said -- this would be huge for us, 
reduce development time and make the logic of our MR jobs more accessible to 
business people.

> The lowest level map-reduce APIs should be byte oriented
> --------------------------------------------------------
>
>                 Key: MAPREDUCE-326
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-326
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: eric baldeschwieler
>
> As discussed here:
> https://issues.apache.org/jira/browse/HADOOP-1986#action_12551237
> The templates, serializers and other complexities that allow map-reduce to 
> use arbitrary types complicate the design and lead to lots of object creates 
> and other overhead that a byte oriented design would not suffer.  I believe 
> the lowest level implementation of hadoop map-reduce should have byte string 
> oriented APIs (for keys and values).  This API would be more performant, 
> simpler and more easily cross language.
> The existing API could be maintained as a thin layer on top of the leaner API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-326) The lowest level map-reduce APIs should be byte oriented

Reply via email to