[
https://issues.apache.org/jira/browse/BEAM-8016?focusedWorklogId=344612&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-344612
]
ASF GitHub Bot logged work on BEAM-8016:
----------------------------------------
Author: ASF GitHub Bot
Created on: 15/Nov/19 22:10
Start Date: 15/Nov/19 22:10
Worklog Time Spent: 10m
Work Description: KevinGG commented on pull request #10132: [BEAM-8016]
Pipeline Graph
URL: https://github.com/apache/beam/pull/10132#discussion_r347031607
##########
File path:
sdks/python/apache_beam/runners/interactive/display/pipeline_graph.py
##########
@@ -136,14 +159,50 @@ def _generate_graph_dicts(self):
vertex_dict[invisible_leaf] = {'style': 'invis'}
self._edge_to_vertex_pairs[pcoll_id].append(
(transform.unique_name, invisible_leaf))
- edge_dict[(transform.unique_name, invisible_leaf)] = {}
+ if self._pin:
+ edge_label = {'label':
+ self._pin.cacheable_var_by_pcoll_id(pcoll_id)}
+ edge_dict[(transform.unique_name, invisible_leaf)] = edge_label
+ else:
+ edge_dict[(transform.unique_name, invisible_leaf)] = {}
+ # For PCollections with more than one consuming PTransform, we also add
+ # an invisible dummy node to diverge the edge in the middle as the
+ # single output is used by multiple down stream PTransforms as inputs
+ # instead of emitting multiple edges.
+ elif len(self._consumers[pcoll_id]) > 1:
+ intermediate_dummy = 'diverge{}'.format(
+ hash(pcoll_id) % 10000)
+ vertex_dict[intermediate_dummy] = {'shape': 'point',
+ 'width': '0'}
+ for consumer in self._consumers[pcoll_id]:
+ producer_name = transform.unique_name
+ consumer_name = transforms[consumer].unique_name
+ self._edge_to_vertex_pairs[pcoll_id].append(
+ (producer_name, intermediate_dummy))
+ if self._pin:
+ edge_dict[(producer_name, intermediate_dummy)] = {
+ 'arrowhead': 'none',
+ 'label':
+ self._pin.cacheable_var_by_pcoll_id(pcoll_id)}
+ else:
+ edge_dict[(producer_name, intermediate_dummy)] = {
+ 'arrowhead': 'none'}
+ self._edge_to_vertex_pairs[pcoll_id].append(
+ (intermediate_dummy, consumer_name))
+ edge_dict[(intermediate_dummy, consumer_name)] = {}
else:
for consumer in self._consumers[pcoll_id]:
producer_name = transform.unique_name
consumer_name = transforms[consumer].unique_name
self._edge_to_vertex_pairs[pcoll_id].append(
(producer_name, consumer_name))
- edge_dict[(producer_name, consumer_name)] = {}
+ if self._pin:
+ edge_dict[(producer_name, consumer_name)] = {
+ 'label':
+ self._pin.cacheable_var_by_pcoll_id(pcoll_id)
+ }
+ else:
+ edge_dict[(producer_name, consumer_name)] = {}
Review comment:
The entire logic will be simplified once data-centric user flow is in-place
and pcollections are rendered as nodes rather than labels on edges.
The difference introduced:
Before:

After:

In the future:
With data-centric user flow, notebook cell execution metadata and new Beam
pipeline graph proposal:

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 344612)
Time Spent: 20m (was: 10m)
> Render Beam Pipeline as DOT with Interactive Beam
> ---------------------------------------------------
>
> Key: BEAM-8016
> URL: https://issues.apache.org/jira/browse/BEAM-8016
> Project: Beam
> Issue Type: Improvement
> Components: runner-py-interactive
> Reporter: Ning Kang
> Assignee: Ning Kang
> Priority: Major
> Time Spent: 20m
> Remaining Estimate: 0h
>
> With work in https://issues.apache.org/jira/browse/BEAM-7760, Beam pipeline
> converted to DOT then rendered should mark user defined variables on edges.
> With work in https://issues.apache.org/jira/browse/BEAM-7926, it might be
> redundant or confusing to render arbitrary random sample PCollection data on
> edges.
> We'll also make sure edges in the graph corresponds to output -> input
> relationship in the user defined pipeline. Each edge is one output. If
> multiple down stream inputs take the same output, it should be rendered as
> one edge diverging into two instead of two edges.
> For advanced interactivity highlight where each execution highlights the part
> of the pipeline really executed from the original pipeline, we'll also
> provide the support in beta.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)