Getting on a plane. :) Not the way materialize works for unions- at least, not currently. I could change how materialize works and see if it's not too invasive. I have some ideas. On Nov 20, 2013 7:23 AM, "Gabriel Reid (JIRA)" <[email protected]> wrote:
> > [ > https://issues.apache.org/jira/browse/CRUNCH-294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827755#comment-13827755] > > Gabriel Reid commented on CRUNCH-294: > ------------------------------------- > > The logic for choosing splits sounds good to me. > > About the breakpoint() method: is it not possible to get the same > functionality by calling materialize() on a PCollection? I'm also a little > bit worried about the name -- I think it could cause some confusion in > terms of debugger breakpoints (but maybe that's just me). The only other > name I can think of for that is checkpoint(), which maybe also brings its > share of confusion along with it. > > And one other small thing: I noticed a Cloudera file header in > BreakpointIT.java. > > > Cost-based job planning > > ----------------------- > > > > Key: CRUNCH-294 > > URL: https://issues.apache.org/jira/browse/CRUNCH-294 > > Project: Crunch > > Issue Type: Improvement > > Components: Core > > Reporter: Josh Wills > > Assignee: Josh Wills > > Attachments: CRUNCH-294.patch, CRUNCH-294b.patch, > jobplan-default-new.png, jobplan-default-old.png, jobplan-large_s2_s3.png, > jobplan-lopsided.png > > > > > > A bug report on the user list drove me to revisit some of the core > planning logic, particularly around how we decide where to split up DoFns > between two dependent MapReduce jobs. > > I found an old TODO about using the scale factor from a DoFn to decide > where to split up the nodes between dependent GBKs, so I implemented a new > version of the split algorithm that takes advantage of how we've propagated > support for multiple outputs on both the map and reduce sides of a job to > do finer-grained splits that use information from the scaleFactor > calculations to make smarter split decisions. > > One high-level change along with this: I changed the default > scaleFactor() value in DoFn to 0.99f to slightly prefer writes that occur > later in a pipeline flow by default. > > > > -- > This message was sent by Atlassian JIRA > (v6.1#6144) >
