[jira] [Commented] (MAPREDUCE-2841) Task level native optimization

He Yongqiang (JIRA) Mon, 29 Aug 2011 13:36:07 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13093174#comment-13093174
 ]


He Yongqiang commented on MAPREDUCE-2841:
-----------------------------------------

bq. The bucketed sort used from 0.10 to 0.16 had more internal fragmentation 
and a less predictable memory footprint (particularly for jobs with lots of 
reducers).

If the java impl use the similar impl as the c++ one here, the only difference 
will be language. right? Sorry, can you explain more about how the c++ can do a 
better job here for predictable memory footprint? in the current java impl, all 
records (no matter which reducer it is going) are stored in a central byte 
array. In the c++ impl, on one mapper task, each reducer will have one 
corresponding partition bucket which maintains its own memory buffer. From what 
i understand, one partition bucket is for one reducer. and all records going to 
that reducer from the current maptask are stored there, will be sorted and 
spilled from there. From the sort part is that it save the number of comparison 
since the original sort will need to compared records from difference reducers. 
And the c++ impl has trick of doing prefix comparison which reduces the number 
of cpu ops (8 bytes compare -> one long cmp op).

bq. Subsequent implementations focused on reducing the number of spills for 
each task, because the cost of spilling dominated the cost of the sort.Even 
with a significant speedup in the sort step, avoiding a merge by managing 
memory more carefully usually effects faster task times.

I totally agree the spill will be the dominate factor if it is there. So here 
comes the problem that how much more memory the java impl will need compared to 
the c++ one. 20% or 50% or 100%? so we can calculate the chance of avoidable 
spilling if using the c++ impl.
(Note: based on our analysis on jobs running during the past one month, most 
jobs need to shuffle less than 700MB data per mapper.)


> Task level native optimization
> ------------------------------
>
>                 Key: MAPREDUCE-2841
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: task
>         Environment: x86-64 Linux
>            Reporter: Binglin Chang
>            Assignee: Binglin Chang
>         Attachments: MAPREDUCE-2841.v1.patch, dualpivot-0.patch, 
> dualpivotv20-0.patch
>
>
> I'm recently working on native optimization for MapTask based on JNI. 
> The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs 
> emitted by mapper, therefore sort, spill, IFile serialization can all be done 
> in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising 
> results:
> 1. Sort is about 3x-10x as fast as java(only binary string compare is 
> supported)
> 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware 
> CRC32C is used, things can get much faster(1G/s).
> 3. Merge code is not completed yet, so the test use enough io.sort.mb to 
> prevent mid-spill
> This leads to a total speed up of 2x~3x for the whole MapTask, if 
> IdentityMapper(mapper does nothing) is used.
> There are limitations of course, currently only Text and BytesWritable is 
> supported, and I have not think through many things right now, such as how to 
> support map side combine. I had some discussion with somebody familiar with 
> hive, it seems that these limitations won't be much problem for Hive to 
> benefit from those optimizations, at least. Advices or discussions about 
> improving compatibility are most welcome:) 
> Currently NativeMapOutputCollector has a static method called canEnable(), 
> which checks if key/value type, comparator type, combiner are all compatible, 
> then MapTask can choose to enable NativeMapOutputCollector.
> This is only a preliminary test, more work need to be done. I expect better 
> final results, and I believe similar optimization can be adopt to reduce task 
> and shuffle too. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2841) Task level native optimization

Reply via email to