[ 
https://issues.apache.org/jira/browse/CRUNCH-420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14034330#comment-14034330
 ] 

Josh Wills commented on CRUNCH-420:
-----------------------------------

Yep, reading over it now. So it seems like we have two situations where 
breakpointing is needed (I may have this wrong, but I'm going to try to write 
it up):

1) We have two dependent GBK operations, and we want to signal to the planner 
where to split in between them, which is handled by CRUNCH-294.
2) We have a single data prep step that is going to feed multiple downstream 
GBKs. We don't want to run it twice in separate jobs (either b/c it's compute 
intensive, or b/c it does an amazing job of filtering a large output file), so 
we mark it as materialized and have it get created in a single map-only job 
that then feeds the downstream GBKs, which is handled by this patch.

Is there another breakpoint situation I'm missing? Is there a reduce-side 
version of this problem?

> Breakpoints Not Working
> -----------------------
>
>                 Key: CRUNCH-420
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-420
>             Project: Crunch
>          Issue Type: Bug
>         Environment: Crunch 0.8.2
>            Reporter: Allan Shoup
>            Assignee: Josh Wills
>         Attachments: Breakpoint2IT.java, CRUNCH-420.patch, 
> testBreakpoint_plan.png
>
>
> Reading through CRUNCH-294, it looks like materialize is supposed to function 
> as a breakpoint to the planner. I've seen several plans where it appeared to 
> me a particular DoFn shouldn't have been repeated, but it was.
> I'll attach some supporting material.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to