[ https://issues.apache.org/jira/browse/CRUNCH-602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dmitry Goldin updated CRUNCH-602: --------------------------------- Attachment: quickfix_crunch_distcache.patch > Combiner initialization repeatedly retrieves RT nodes from DistCache, leading > to high NN load > --------------------------------------------------------------------------------------------- > > Key: CRUNCH-602 > URL: https://issues.apache.org/jira/browse/CRUNCH-602 > Project: Crunch > Issue Type: Improvement > Components: Core > Affects Versions: 0.12.0, 0.13.0, 0.14.0 > Environment: Crunch 0.14-SNAPSHOT, CDH5.6.0 > Reporter: Michael Rose > Assignee: Josh Wills > Labels: performance > Attachments: crunch-602.patch, quickfix_crunch_distcache.patch > > > When running one of our Crunch pipelines, we noticed our NameNode under very > heavy load. We run our masters on pretty light hardware, so our NN was > sitting at 100% CPU. > Crunch reads the RTNodes during creation of a CrunchTaskContext. These are > created when Mappers and Reducers are created. Importantly, a CrunchCombiner > is a subclass of a Reducer, so each mapper will create R combiners where R is > the number of reducers and thus R CrunchTaskContexts. Consequently in highly > parallel jobs, this means M*R semi-expensive calls to the NameNode. > In the constructor for CrunchTaskContext, this is the read to the DistCache: > this.nodes = (List<RTNode>) DistCache.read(conf, path); > Which then leads to a read into the NN + deserialization. > For now, we took the overly simplistic approach of caching the results of the > DistCache read in a Guava cache. The cache ensures combiners reuse RTNodes > with only the overhead of deserialization which is somewhat unavoidable as > RTNodes are stateful and not reusable. However, it's not configurable except > by modifying code. > I'll attach the patch, but given that it's not yet configurable I wouldn't > call it a "fix available." There may be much better ways of fixing this issue > as well -- if you have some guidance I'd be happy to do the legwork on a > patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)