[ https://issues.apache.org/jira/browse/CRUNCH-420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14034330#comment-14034330 ]
Josh Wills commented on CRUNCH-420: ----------------------------------- Yep, reading over it now. So it seems like we have two situations where breakpointing is needed (I may have this wrong, but I'm going to try to write it up): 1) We have two dependent GBK operations, and we want to signal to the planner where to split in between them, which is handled by CRUNCH-294. 2) We have a single data prep step that is going to feed multiple downstream GBKs. We don't want to run it twice in separate jobs (either b/c it's compute intensive, or b/c it does an amazing job of filtering a large output file), so we mark it as materialized and have it get created in a single map-only job that then feeds the downstream GBKs, which is handled by this patch. Is there another breakpoint situation I'm missing? Is there a reduce-side version of this problem? > Breakpoints Not Working > ----------------------- > > Key: CRUNCH-420 > URL: https://issues.apache.org/jira/browse/CRUNCH-420 > Project: Crunch > Issue Type: Bug > Environment: Crunch 0.8.2 > Reporter: Allan Shoup > Assignee: Josh Wills > Attachments: Breakpoint2IT.java, CRUNCH-420.patch, > testBreakpoint_plan.png > > > Reading through CRUNCH-294, it looks like materialize is supposed to function > as a breakpoint to the planner. I've seen several plans where it appeared to > me a particular DoFn shouldn't have been repeated, but it was. > I'll attach some supporting material. -- This message was sent by Atlassian JIRA (v6.2#6252)