[
https://issues.apache.org/jira/browse/CRUNCH-294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Work on CRUNCH-294 started by Josh Wills.
> Cost-based job planning
> -----------------------
>
> Key: CRUNCH-294
> URL: https://issues.apache.org/jira/browse/CRUNCH-294
> Project: Crunch
> Issue Type: Improvement
> Components: Core
> Reporter: Josh Wills
> Assignee: Josh Wills
> Attachments: CRUNCH-294.patch
>
>
> A bug report on the user list drove me to revisit some of the core planning
> logic, particularly around how we decide where to split up DoFns between two
> dependent MapReduce jobs.
> I found an old TODO about using the scale factor from a DoFn to decide where
> to split up the nodes between dependent GBKs, so I implemented a new version
> of the split algorithm that takes advantage of how we've propagated support
> for multiple outputs on both the map and reduce sides of a job to do
> finer-grained splits that use information from the scaleFactor calculations
> to make smarter split decisions.
> One high-level change along with this: I changed the default scaleFactor()
> value in DoFn to 0.99f to slightly prefer writes that occur later in a
> pipeline flow by default.
--
This message was sent by Atlassian JIRA
(v6.1#6144)