[ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=332279&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332279
 ]

ASF GitHub Bot logged work on BEAM-8457:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 22/Oct/19 23:05
            Start Date: 22/Oct/19 23:05
    Worklog Time Spent: 10m 
      Work Description: KevinGG commented on pull request #9854: [BEAM-8457] 
Label Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9854#discussion_r337787464
 
 

 ##########
 File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
 ##########
 @@ -360,6 +360,15 @@ def visit_transform(self, transform_node):
 
   def run_pipeline(self, pipeline, options):
     """Remotely executes entire pipeline or parts reachable from node."""
+    # Label goog-notebook if pipeline is initiated from interactive runner.
+    from apache_beam.runners.interactive import interactive_runner
+    if isinstance(pipeline.runner, interactive_runner.InteractiveRunner):
 
 Review comment:
   I've missed the path where a new Pipeline is created and `run()` is invoked 
again.
   Yes, all of these would be possible.
   I've added an `interactive` parameter at the constructor level for 
`Pipeline` using default value `None`. `run()` and `from_runner_api()` will 
pass the `None` or `bool` value down no matter how the user chains the runners. 
I'm not very confident with the naming but the change should be backward 
compatible for Beam.
   
   Currently, I'm running into a problem when testing. Once I set `labels`, 
Dataflow job will fail immediately and throw `Error processing pipeline.` 
error. There will be no job graph, no worker started, no logs. Looks like when 
there is user label in the job request, Dataflow cannot convert the work item 
into internal representation.
   
   I'll do some investigation and figure out why.
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 332279)
    Time Spent: 1h  (was: 50m)

> Instrument Dataflow jobs that are launched from Notebooks
> ---------------------------------------------------------
>
>                 Key: BEAM-8457
>                 URL: https://issues.apache.org/jira/browse/BEAM-8457
>             Project: Beam
>          Issue Type: Improvement
>          Components: runner-py-interactive
>            Reporter: Ning Kang
>            Assignee: Ning Kang
>            Priority: Major
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to