[
https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=401057&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-401057
]
ASF GitHub Bot logged work on BEAM-9446:
----------------------------------------
Author: ASF GitHub Bot
Created on: 10/Mar/20 23:55
Start Date: 10/Mar/20 23:55
Worklog Time Spent: 10m
Work Description: ibzib commented on issue #11052: [BEAM-9446] Add
missing parallelism and execution mode args.
URL: https://github.com/apache/beam/pull/11052#issuecomment-597375999
> Have you tried working around this by not discarding these options? AFAIK
the json parser is smart enough to read the stringified verison of all option
values.
I think this may be the best strategy for the uber jar job server, however I
don't think we should change this behavior for other runners. (Not sure if
that's what you were proposing, just organizing my thoughts here:)
- In Dataflow, we seem to duplicate every runner option for each SDK,
perhaps because there is no better choice due to the runner architecture. In
that case, since all the args are presumably known by the SDK, it makes more
sense to drop them (status quo) or maybe even error when arguments are unknown,
because it usually means the user made a mistake.
- With the "old" Flink job server, retrieving args from the job server is an
adequate workaround, so again, there should be no need for unrecognized
arguments.
I discussed this with @angoenka today and he suggested that we consider the
runner-Flink boundary as well -- i.e., if we should have some way of enabling
_all_ Flink environment options to be set through Beam pipeline options instead
of just adding the ones we need as we go. This would potentially save users
from having to wait for a new release just for us to add a pipeline option that
trivially maps 1:1 to Flink (of course, they can always change Flink's conf
files, which was going to be my proposed workaround here, but AFAIK that
requires a restart of the cluster and would affect all jobs run on the
cluster). WDYT?
+cc @tweise
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 401057)
Time Spent: 1h 40m (was: 1.5h)
> FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
> ------------------------------------------------------------------------------
>
> Key: BEAM-9446
> URL: https://issues.apache.org/jira/browse/BEAM-9446
> Project: Beam
> Issue Type: Bug
> Components: runner-flink
> Reporter: Kyle Weaver
> Assignee: Kyle Weaver
> Priority: Major
> Labels: portability-flink
> Time Spent: 1h 40m
> Remaining Estimate: 0h
>
> I need these options for TFX, but they're being discarded (I believe they are
> normally supplied by the job server).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)