[
https://issues.apache.org/jira/browse/CRUNCH-294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13828078#comment-13828078
]
Josh Wills commented on CRUNCH-294:
-----------------------------------
(Will fix the header as well.)
> Cost-based job planning
> -----------------------
>
> Key: CRUNCH-294
> URL: https://issues.apache.org/jira/browse/CRUNCH-294
> Project: Crunch
> Issue Type: Improvement
> Components: Core
> Reporter: Josh Wills
> Assignee: Josh Wills
> Attachments: CRUNCH-294.patch, CRUNCH-294b.patch, CRUNCH-294c.patch,
> jobplan-default-new.png, jobplan-default-old.png, jobplan-large_s2_s3.png,
> jobplan-lopsided.png
>
>
> A bug report on the user list drove me to revisit some of the core planning
> logic, particularly around how we decide where to split up DoFns between two
> dependent MapReduce jobs.
> I found an old TODO about using the scale factor from a DoFn to decide where
> to split up the nodes between dependent GBKs, so I implemented a new version
> of the split algorithm that takes advantage of how we've propagated support
> for multiple outputs on both the map and reduce sides of a job to do
> finer-grained splits that use information from the scaleFactor calculations
> to make smarter split decisions.
> One high-level change along with this: I changed the default scaleFactor()
> value in DoFn to 0.99f to slightly prefer writes that occur later in a
> pipeline flow by default.
--
This message was sent by Atlassian JIRA
(v6.1#6144)