[
https://issues.apache.org/jira/browse/BEAM-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16640114#comment-16640114
]
Udi Meiri commented on BEAM-5442:
---------------------------------
I believe PR 6557 broke integration tests using Dataflow.
Cloud console:
"Parsing unknown args:
[u'--dataflowJobId=2018-10-05_07_00_20-5526009939236014896',
u'--autoscalingAlgorithm=NONE', u'--direct_runner_use_stacked_bundle',
u'--maxNumWorkers=0', u'--style=scrambled', u'--sleep_secs=20',
u'--pipeline_type_check',
u'--gcpTempLocation=gs://temp-storage-for-end-to-end-tests/temp-it/beamapp-jenkins-1005140012-917021.1538748012.917145',
u'--numWorkers=1', u'--beam_plugins=apache_beam.io.filesystem.FileSystem',
u'--beam_plugins=apache_beam.io.hadoopfilesystem.HadoopFileSystem',
u'--beam_plugins=apache_beam.io.localfilesystem.LocalFileSystem',
u'--beam_plugins=apache_beam.io.gcp.gcsfilesystem.GCSFileSystem',
u'--beam_plugins=apache_beam.io.filesystem_test.TestingFileSystem',
u'--beam_plugins=apache_beam.runners.interactive.display.pipeline_graph_renderer.PipelineGraphRenderer',
u'--beam_plugins=apache_beam.runners.interactive.display.pipeline_graph_renderer.MuteRenderer',
u'--beam_plugins=apache_beam.runners.interactive.display.pipeline_graph_renderer.TextRenderer',
u'--beam_plugins=apache_beam.runners.interactive.display.pipeline_graph_renderer.PydotRenderer',
u'--pipelineUrl=gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1005140012-917021.1538748012.917145/pipeline.pb']"
"Python sdk harness failed:
Traceback (most recent call last):
File
"/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker_main.py",
line 133, in main
sdk_pipeline_options.get_all_options(drop_default=True))
File
"/usr/local/lib/python2.7/dist-packages/apache_beam/options/pipeline_options.py",
line 224, in get_all_options
parser.add_argument(arg.split('=', 1)[0], nargs='?')
File "/usr/lib/python2.7/argparse.py", line 1308, in add_argument
return self._add_action(action)
File "/usr/lib/python2.7/argparse.py", line 1682, in _add_action
self._optionals._add_action(action)
File "/usr/lib/python2.7/argparse.py", line 1509, in _add_action
action = super(_ArgumentGroup, self)._add_action(action)
File "/usr/lib/python2.7/argparse.py", line 1322, in _add_action
self._check_conflict(action)
File "/usr/lib/python2.7/argparse.py", line 1460, in _check_conflict
conflict_handler(action, confl_optionals)
File "/usr/lib/python2.7/argparse.py", line 1467, in _handle_conflict_error
raise ArgumentError(action, message % conflict_string)
ArgumentError: argument --beam_plugins: conflicting option string(s):
--beam_plugins"
----
Test output:
07:28:37 ======================================================================
07:28:37 FAIL: test_streaming_with_attributes
(apache_beam.io.gcp.pubsub_integration_test.PubSubIntegrationTest)
07:28:37 ----------------------------------------------------------------------
07:28:37 Traceback (most recent call last):
07:28:37 File
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/io/gcp/pubsub_integration_test.py",
line 172, in test_streaming_with_attributes
07:28:37 self._test_streaming(with_attributes=True)
07:28:37 File
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/io/gcp/pubsub_integration_test.py",
line 164, in _test_streaming
07:28:37 timestamp_attribute=self.TIMESTAMP_ATTRIBUTE)
07:28:37 File
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/io/gcp/pubsub_it_pipeline.py",
line 91, in run_pipeline
07:28:37 result = p.run()
07:28:37 File
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/pipeline.py",
line 416, in run
07:28:37 return self.runner.run_pipeline(self)
07:28:37 File
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py",
line 65, in run_pipeline
07:28:37 hc_assert_that(self.result, pickler.loads(on_success_matcher))
07:28:37 AssertionError:
07:28:37 Expected: (Test pipeline expected terminated in state: RUNNING and
Expected 2 messages.)
07:28:37 but: Expected 2 messages. Got 0 messages. Diffs (item, count):
07:28:37 Expected but not in actual: [(PubsubMessage(data001-seen,
{'processed': 'IT'}), 1), (PubsubMessage(data002-seen, {'timestamp_out':
'2018-07-11T02:02:50.149000Z', 'processed': 'IT'}), 1)]
07:28:37 Unexpected: []
07:28:37 Stripped attributes: ['id', 'timestamp']
07:28:37
07:28:37 -------------------- >> begin captured stdout << ---------------------
07:28:37 Found:
https://console.cloud.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2018-10-05_07_00_20-5526009939236014896?project=apache-beam-testing.
07:28:37
07:28:37 --------------------- >> end captured stdout << ----------------------
> PortableRunner swallows custom options for Runner
> -------------------------------------------------
>
> Key: BEAM-5442
> URL: https://issues.apache.org/jira/browse/BEAM-5442
> Project: Beam
> Issue Type: Bug
> Components: sdk-java-core, sdk-py-core
> Reporter: Maximilian Michels
> Assignee: Maximilian Michels
> Priority: Major
> Labels: portability, portability-flink
> Fix For: 2.8.0
>
> Time Spent: 4.5h
> Remaining Estimate: 0h
>
> The PortableRunner doesn't pass custom PipelineOptions to the executing
> Runner.
> Example: {{--parallelism=4}} won't be forwarded to the FlinkRunner.
> (The option is just removed during proto translation without any warning)
> We should allow some form of customization through the options, even for the
> PortableRunner.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)