[ https://issues.apache.org/jira/browse/CRUNCH-525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Josh Wills updated CRUNCH-525: ------------------------------ Attachment: CRUNCH-525b.patch I played with a few different versions of [~spatel89]'s formula and felt that the average of the two scale factors was the best default guess, on the assumption that keys.size == values.size in this context. This patch reflects that. > The ExtractKeyFn is has an incorrect scale factor > ------------------------------------------------- > > Key: CRUNCH-525 > URL: https://issues.apache.org/jira/browse/CRUNCH-525 > Project: Crunch > Issue Type: Bug > Components: Core > Affects Versions: 0.12.0 > Reporter: Stephen Patel > Assignee: Josh Wills > Priority: Minor > Attachments: CRUNCH-525.patch, CRUNCH-525b.patch > > > The ExtractKeyFn[0] used by the by[1] method of the PCollectionImpl is using > the default scale factor for a MapFn (1.0). It should be using 1.0 + the > scale factor of the wrapped MapFn, in order to be accurate. > [0]: > https://github.com/apache/crunch/blob/master/crunch-core/src/main/java/org/apache/crunch/fn/ExtractKeyFn.java > [1]: > https://github.com/apache/crunch/blob/master/crunch-core/src/main/java/org/apache/crunch/impl/dist/collect/PCollectionImpl.java#L270 -- This message was sent by Atlassian JIRA (v6.3.4#6332)