Dmitry Goldin commented on CRUNCH-602:


We have also discovered this issue on our cluster and the issue causes severe 
load on the Namenode, especially since we have fairly large crunch jobs (tens 
of thousands of mappers and hundreds or thousands of reducers). 

The amount of reads to the COMBINE file is orders of magnitude larger than 
necessary. I.e, instead of roughly $num_mappers*$num_spills + $num_reducers we 
have roughly $num_mappers*$num_spills*$num_reducers (simplified)

The initial description of this issue is pretty spot on, but there are 
additional scenarios when combiners are initialised, such as for each spill and 
it extends to MAP and REDUCE files too, but the effect there does not amplify 
so strongly. If a more detailed analysis is required, I can try to add some of 
the internal investigation I have been doing.

At the moment we are testing a patched version internally which alleviates the 
issue as far as I can judge.
I will attach the patch for information purposes.

> Combiner initialization repeatedly retrieves RT nodes from DistCache, leading 
> to high NN load
> ---------------------------------------------------------------------------------------------
>                 Key: CRUNCH-602
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-602
>             Project: Crunch
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.12.0, 0.13.0, 0.14.0
>         Environment: Crunch 0.14-SNAPSHOT, CDH5.6.0
>            Reporter: Michael Rose
>            Assignee: Josh Wills
>              Labels: performance
>         Attachments: crunch-602.patch
> When running one of our Crunch pipelines, we noticed our NameNode under very 
> heavy load. We run our masters on pretty light hardware, so our NN was 
> sitting at 100% CPU.
> Crunch reads the RTNodes during creation of a CrunchTaskContext. These are 
> created when Mappers and Reducers are created. Importantly, a CrunchCombiner 
> is a subclass of a Reducer, so each mapper will create R combiners where R is 
> the number of reducers and thus R CrunchTaskContexts. Consequently in highly 
> parallel jobs, this means M*R semi-expensive calls to the NameNode.
> In the constructor for CrunchTaskContext, this is the read to the DistCache:
> this.nodes = (List<RTNode>) DistCache.read(conf, path);
> Which then leads to a read into the NN + deserialization.
> For now, we took the overly simplistic approach of caching the results of the 
> DistCache read in a Guava cache. The cache ensures combiners reuse RTNodes 
> with only the overhead of deserialization which is somewhat unavoidable as 
> RTNodes are stateful and not reusable. However, it's not configurable except 
> by modifying code.
> I'll attach the patch, but given that it's not yet configurable I wouldn't 
> call it a "fix available." There may be much better ways of fixing this issue 
> as well -- if you have some guidance I'd be happy to do the legwork on a 
> patch.

This message was sent by Atlassian JIRA

Reply via email to