[jira] [Commented] (MAPREDUCE-2841) Task level native optimization

Binglin Chang (JIRA) Mon, 21 Jul 2014 01:38:30 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14068323#comment-14068323
 ]


Binglin Chang commented on MAPREDUCE-2841:
------------------------------------------

I see the comments in the test code, but it doesn't help much, my env is 
ubuntu12, glibc: 2.15-0ubuntu10.3

{code}
Expected: expect[index]
Which is: 4102672832

uint32_t expect[5] = {-1538241715, -1288088794, -192294464, 563552421, 
1661521654};
  while(NULL != (key = reader->nextKey(length))) {
    int realKey = bswap(*(uint32_t *)(key));
    ASSERT_EQ(expect[index], realKey);
    index++;
  }
{code}

the expected value is not in expect array, maybe the uint32 to int32 is buggy? 
or there must be an "array index out of range" bug in the code? 



> Task level native optimization
> ------------------------------
>
>                 Key: MAPREDUCE-2841
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: task
>         Environment: x86-64 Linux/Unix
>            Reporter: Binglin Chang
>            Assignee: Sean Zhong
>         Attachments: DESIGN.html, MAPREDUCE-2841.v1.patch, 
> MAPREDUCE-2841.v2.patch, dualpivot-0.patch, dualpivotv20-0.patch, 
> fb-shuffle.patch, hadoop-3.0-mapreduce-2841-2014-7-17.patch
>
>
> I'm recently working on native optimization for MapTask based on JNI. 
> The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs 
> emitted by mapper, therefore sort, spill, IFile serialization can all be done 
> in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising 
> results:
> 1. Sort is about 3x-10x as fast as java(only binary string compare is 
> supported)
> 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware 
> CRC32C is used, things can get much faster(1G/
> 3. Merge code is not completed yet, so the test use enough io.sort.mb to 
> prevent mid-spill
> This leads to a total speed up of 2x~3x for the whole MapTask, if 
> IdentityMapper(mapper does nothing) is used
> There are limitations of course, currently only Text and BytesWritable is 
> supported, and I have not think through many things right now, such as how to 
> support map side combine. I had some discussion with somebody familiar with 
> hive, it seems that these limitations won't be much problem for Hive to 
> benefit from those optimizations, at least. Advices or discussions about 
> improving compatibility are most welcome:) 
> Currently NativeMapOutputCollector has a static method called canEnable(), 
> which checks if key/value type, comparator type, combiner are all compatible, 
> then MapTask can choose to enable NativeMapOutputCollector.
> This is only a preliminary test, more work need to be done. I expect better 
> final results, and I believe similar optimization can be adopt to reduce task 
> and shuffle too. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAPREDUCE-2841) Task level native optimization

Reply via email to