[ https://issues.apache.org/jira/browse/CRUNCH-624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Micah Whitacre updated CRUNCH-624: ---------------------------------- Fix Version/s: 0.15.0 > temporary table size is 0, which makes reducer number too small > --------------------------------------------------------------- > > Key: CRUNCH-624 > URL: https://issues.apache.org/jira/browse/CRUNCH-624 > Project: Crunch > Issue Type: Bug > Components: Core > Reporter: JingChen > Assignee: Josh Wills > Fix For: 0.15.0 > > Attachments: CRUNCH-624.patch > > > if the pipeline produce temporary table , the reduce number of the temporary > table whose input table is temporary table may become very small in some > cases, since temporary table has no content . > And, I may found the root cause in my caseļ¼ > {code:title=PCollectionImpl.java|borderStyle=solid} > public void materializeAt(SourceTarget<S> sourceTarget) { > this.materializedAt = sourceTarget; > this.size = materializedAt.getSize(getPipeline().getConfiguration()); > } > @Override > public long getSize() { > if (size < 0) { > this.size = getSizeInternal(); > } > return size; > } > {code} > PColletionImpl.materializeAt(sourceTarget) this method will be invoked when > node splits to create temporary table, source sourceTarget binds with the new > temporary table whose size is 0, since its path was just created, the > this.size will be 0. After that, when getSize() was invoked by setting reduce > number, since the size is 0, it will just return 0, which makes reduce number > too small. > So i think the code of materializeAt() should check sourceTarget's size, like > below: > {code:title=PCollectionImpl.java|borderStyle=solid} > public void materializeAt(SourceTarget<S> sourceTarget) { > this.materializedAt = sourceTarget; > long size = materializedAt.getSize(getPipeline().getConfiguration()); > if (size > 0) > this.size = size; > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)