[ 
https://issues.apache.org/jira/browse/CRUNCH-438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14053632#comment-14053632
 ] 

Christian Tzolov commented on CRUNCH-438:
-----------------------------------------

h5. Some unresolved topics:
# The BASE_GRAPH_PLANE_DOTFILE and SPLIT_GRAPH_PLANE_DOTFILE diagrams are 
generated inside the while loop in the MSCRPlanner#plan() method:
{code:title=MSCRPlanner.java|borderStyle=solid}
public MRExecutor plan(...)  {
   ...
   while (!targetDeps.isEmpty()) {
      ...
     Create BASE_GRAPH_PLANE_DOTFILE
     Create SPLIT_GRAPH_PLANE_DOTFILE
     ...
   }
   ...
}
{code}
The current implementation will register only the graphs from the last 
iteration! 
# For the RTNode diagram (RTNODES_PLAN_DOTFILE) if have not figured out how to 
connect the dependent jobs.
The MRJob#getDependentJobs() returns a list of dependent jobs, but it is not 
clear which output to which input to wire. The wire logic should repeat the 
exact logic in the code. If not mistaken the wire info has to be retrieved from 
the job Competition Hook attributes.   
# Thinking about a way to abstract the tracing logic/code (e.g. dotfiles) from 
the main code  I've been thinking of trace interface (below). One or more 
implementation would be registered with the planner and notified on event.  
{code:title=PlannerTracker.java|borderStyle=solid}
interface PlaneTracker {
   void onPCollectionPlan(String name, Map<PCollection<?>, Set<Target>> 
outputs);
   void onBaseGraphPlan(String name, Graph graph, Map<PCollection<?>, 
Set<Target>> outputs);
   void onSplitGraphPlan(String name, Graph graph, Map<PCollection<?>, 
Set<Target>> outputs, List<List<Vertex>> components);
   void onRunTimeConfiguration(String name, List<MRJob> jobs);
   void onPipelinePlane(String name, List<JobPrototype> protos);
}
{code}
This is pretty rough but hopefully will help to start the discussion


> Visualizations of some important internal/intermediate pipeline planning 
> states
> -------------------------------------------------------------------------------
>
>                 Key: CRUNCH-438
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-438
>             Project: Crunch
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.10.0, 0.8.3
>            Reporter: Christian Tzolov
>            Assignee: Christian Tzolov
>         Attachments: CRUNCH-438.2.patch, CRUNCH-438.patch
>
>
> To improve the understability of the pipeline planning stages it would help 
> to visualize some intermediate planning states like:
> - PCollection lineage. (visualizing the output-pcollection-targets structure) 
> - MSCRPlanner's planning Graphs before and after the split up of dependent 
> GBK nodes
> - RTNode hierarchy along with the Input and Output configurations as 
> persistent in the Configuration before the execution of the pipeline. 
> Most of the information can be intercepted in the MSCRPlanner#plan()  method.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to