[jira] [Commented] (MAPREDUCE-2841) Task level native optimization

Chris Douglas (JIRA) Sun, 28 Aug 2011 19:20:06 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13092596#comment-13092596
 ]


Chris Douglas commented on MAPREDUCE-2841:
------------------------------------------

{quote}Just an idea, what if memory related configurations can be a random 
variable, 
with mean & variance? Can this leads to better resource utilization? A fixed 
memory bound 
always means application will request more memory than they really need.
I think in many cases predictable memory control is enough, rather than precise 
memory control, 
since it's impractical. We can use some dynamic memory if it is in a predicable 
range, 
for example +/-20%, +-30%, etc.{quote}

The fixed memory bound definitely causes resource waste. Not only will users 
ask for more memory than they need (particularly since most applications are 
not tightly tuned), but in our clusters, users will just as often request far 
too little. Because tasks' memory management is uniformly specified within a 
job, there isn't even an opportunity for the framework to adapt to skew.

The random memory config is an interesting idea, but failed tasks are 
regrettable and expensive waste. For pipelines with SLAs, "random" failures 
will probably motivate users to jack up their memory requirements to match the 
range (which, if configurable, seems to encode the same contract). The precise 
specification was avoiding OOMs; because the collection is across a JNI 
boundary, a "relaxed" predictable memory footprint could be easier to deploy, 
assuming a hard limit in the native code to avoid swapping.

Thanks for the detail on the collection data structures. That makes it much 
easier to orient oneself in the code.

A few quick notes on your 
[earlier|https://issues.apache.org/jira/browse/MAPREDUCE-2841?focusedCommentId=13086973&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13086973]
 comment:

Adding the partition to the record, again, was to make the memory more 
predictable. The overhead in Java tracking thousands of per-partition buckets 
(many going unused) was worse than the per-record overhead, particularly in 
large jobs. Further, user comparators are often horribly inefficient, so the 
partition comparison and related hit to its performance was in the noise. The 
cache miss is real, but hard to reason about without leaving the JVM.

The decorator-based stream is/was? required by the serialization interface. 
While the current patch only supports records with a known serialized length, 
the contract for other types is more general. Probably too general, but users 
with occasional several-hundred MB records (written in chunks) exist. 
Supporting that in this implementation is not a critical use case, since they 
can just use the existing collector. Tuning this to handle memcmp types could 
also put the burden of user comparators on the serialization frameworks, which 
is probably the best strategy. Which is to say: obsoleting the existing 
collection framework doesn't require that this support all of its use cases, if 
some of those can be worked around more competently elsewhere. If its principal 
focus is performance, it may make sense not to support inherently slow 
semantics.

Which brings up a point: what is the scope of this JIRA? A full, native task 
runtime is a formidable job. Even if it only supported memcmp key types, no 
map-side combiner, no user-defined comparators, and records smaller than its 
intermediate buffer, such an improvement would still cover a lot of user jobs. 
It might make sense to commit that subset as optional functionality first, then 
iterate based on feedback.

> Task level native optimization
> ------------------------------
>
>                 Key: MAPREDUCE-2841
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: task
>         Environment: x86-64 Linux
>            Reporter: Binglin Chang
>            Assignee: Binglin Chang
>         Attachments: MAPREDUCE-2841.v1.patch, dualpivot-0.patch, 
> dualpivotv20-0.patch
>
>
> I'm recently working on native optimization for MapTask based on JNI. 
> The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs 
> emitted by mapper, therefore sort, spill, IFile serialization can all be done 
> in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising 
> results:
> 1. Sort is about 3x-10x as fast as java(only binary string compare is 
> supported)
> 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware 
> CRC32C is used, things can get much faster(1G/s).
> 3. Merge code is not completed yet, so the test use enough io.sort.mb to 
> prevent mid-spill
> This leads to a total speed up of 2x~3x for the whole MapTask, if 
> IdentityMapper(mapper does nothing) is used.
> There are limitations of course, currently only Text and BytesWritable is 
> supported, and I have not think through many things right now, such as how to 
> support map side combine. I had some discussion with somebody familiar with 
> hive, it seems that these limitations won't be much problem for Hive to 
> benefit from those optimizations, at least. Advices or discussions about 
> improving compatibility are most welcome:) 
> Currently NativeMapOutputCollector has a static method called canEnable(), 
> which checks if key/value type, comparator type, combiner are all compatible, 
> then MapTask can choose to enable NativeMapOutputCollector.
> This is only a preliminary test, more work need to be done. I expect better 
> final results, and I believe similar optimization can be adopt to reduce task 
> and shuffle too. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2841) Task level native optimization

Reply via email to