[ 
https://issues.apache.org/jira/browse/BEAM-5637?focusedWorklogId=156472&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-156472
 ]

ASF GitHub Bot logged work on BEAM-5637:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 19/Oct/18 20:33
            Start Date: 19/Oct/18 20:33
    Worklog Time Spent: 10m 
      Work Description: aaltay closed pull request #6747: [BEAM-5637]Improve 
docs on worker jar option and add verification.
URL: https://github.com/apache/beam/pull/6747
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/sdks/python/apache_beam/options/pipeline_options.py 
b/sdks/python/apache_beam/options/pipeline_options.py
index db7e7087ffe..4b4fbf640d2 100644
--- a/sdks/python/apache_beam/options/pipeline_options.py
+++ b/sdks/python/apache_beam/options/pipeline_options.py
@@ -554,7 +554,9 @@ def _add_argparse_args(cls, parser):
         '--dataflow_worker_jar',
         dest='dataflow_worker_jar',
         type=str,
-        help='Dataflow worker jar.'
+        help='Dataflow worker jar file. If specified, the jar file is staged '
+             'in GCS, then gets loaded by workers. End users usually '
+             'should not use this feature.'
     )
 
   def validate(self, validator):
diff --git a/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py 
b/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
index 09e4f7d58da..ecaeda07c46 100644
--- a/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
+++ b/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
@@ -351,7 +351,7 @@ def run_pipeline(self, pipeline):
     setup_options.beam_plugins = plugins
 
     # Elevate "min_cpu_platform" to pipeline option, but using the existing
-    # experiment
+    # experiment.
     debug_options = pipeline._options.view_as(DebugOptions)
     worker_options = pipeline._options.view_as(WorkerOptions)
     if worker_options.min_cpu_platform:
@@ -384,6 +384,11 @@ def run_pipeline(self, pipeline):
 
     dataflow_worker_jar = getattr(worker_options, 'dataflow_worker_jar', None)
     if dataflow_worker_jar is not None:
+      if not apiclient._use_fnapi(pipeline._options):
+        logging.fatal(
+            'Typical end users should not use this worker jar feature. '
+            'It can only be used when fnapi is enabled.')
+
       experiments = ["use_staged_dataflow_worker_jar"]
       if debug_options.experiments is not None:
         experiments = list(set(experiments + debug_options.experiments))


 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 156472)
    Time Spent: 5.5h  (was: 5h 20m)

> Python support for custom dataflow worker jar
> ---------------------------------------------
>
>                 Key: BEAM-5637
>                 URL: https://issues.apache.org/jira/browse/BEAM-5637
>             Project: Beam
>          Issue Type: Sub-task
>          Components: sdk-py-core
>            Reporter: Henning Rohde
>            Assignee: Ruoyun Huang
>            Priority: Major
>          Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> One of the slightly subtle aspects is that we would need to ignore one of the 
> staged jars for portable Python jobs. That requires a change to the Python 
> boot code: 
> https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/python/container/boot.go#L104



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to