[
https://issues.apache.org/jira/browse/SPARK-2678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Patrick Wendell updated SPARK-2678:
-----------------------------------
Fix Version/s: 1.0.3
> `Spark-submit` overrides user application options
> -------------------------------------------------
>
> Key: SPARK-2678
> URL: https://issues.apache.org/jira/browse/SPARK-2678
> Project: Spark
> Issue Type: Bug
> Components: Deploy
> Affects Versions: 1.0.1, 1.0.2
> Reporter: Cheng Lian
> Assignee: Cheng Lian
> Priority: Blocker
> Fix For: 1.1.0, 1.0.3
>
>
> Here is an example:
> {code}
> ./bin/spark-submit --class Foo some.jar --help
> {code}
> SInce {{--help}} appears behind the primary resource (i.e. {{some.jar}}), it
> should be recognized as a user application option. But it's actually
> overriden by {{spark-submit}} and will show {{spark-submit}} help message.
> When directly invoking {{spark-submit}}, the constraints here are:
> # Options before primary resource should be recognized as {{spark-submit}}
> options
> # Options after primary resource should be recognized as user application
> options
> The tricky part is how to handle scripts like {{spark-shell}} that delegate
> {{spark-submit}}. These scripts allow users specify both {{spark-submit}}
> options like {{--master}} and user defined application options together. For
> example, say we'd like to write a new script {{start-thriftserver.sh}} to
> start the Hive Thrift server, basically we may do this:
> {code}
> $SPARK_HOME/bin/spark-submit --class
> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 spark-internal $@
> {code}
> Then user may call this script like:
> {code}
> ./sbin/start-thriftserver.sh --master spark://some-host:7077 --hiveconf
> key=value
> {code}
> Notice that all options are captured by {{$@}}. If we put it before
> {{spark-internal}}, they are all recognized as {{spark-submit}} options, thus
> {{--hiveconf}} won't be passed to {{HiveThriftServer2}}; if we put it after
> {{spark-internal}}, they *should* all be recognized as options of
> {{HiveThriftServer2}}, but because of this bug, {{--master}} is still
> recognized as {{spark-submit}} option and leads to the right behavior.
> Although currently all scripts using {{spark-submit}} work correctly, we
> still should fix this bug, because it causes option name collision between
> {{spark-submit}} and user application, and every time we add a new option to
> {{spark-submit}}, some existing user applications may break. However, solving
> this bug may cause some incompatible changes.
> The suggested solution here is using {{--}} as separator of {{spark-submit}}
> options and user application options. For the Hive Thrift server example
> above, user should call it in this way:
> {code}
> ./sbin/start-thriftserver.sh --master spark://some-host:7077 -- --hiveconf
> key=value
> {code}
> And {{SparkSubmitArguments}} should be responsible for splitting two sets of
> options and pass them correctly.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]