Cheng Lian created SPARK-2678:
---------------------------------

             Summary: `Spark-submit` overrides user application options
                 Key: SPARK-2678
                 URL: https://issues.apache.org/jira/browse/SPARK-2678
             Project: Spark
          Issue Type: Bug
          Components: Deploy
    Affects Versions: 1.0.1, 1.0.2
            Reporter: Cheng Lian
            Priority: Minor


Here is an example:
{code}
./bin/spark-submit --class Foo some.jar --help
{code}
SInce {{--help}} appears behind the primary resource (i.e. {{some.jar}}), it 
should be recognized as a user application option. But it's actually overriden 
by {{spark-submit}} and will show {{spark-submit}} help message.

When directly invoking {{spark-submit}}, the constraints here are:

# Options before primary resource should be recognized as {{spark-submit}} 
options
# Options after primary resource should be recognized as user application 
options

The tricky part is how to handle scripts like {{spark-shell}} that delegate  
{{spark-submit}}. These scripts allow users specify both {{spark-submit}} 
options like {{--master}} and user defined application options together. For 
example, say we'd like to write a new script {{start-thriftserver.sh}} to start 
the Hive Thrift server, basically we may do this:
{code}
$SPARK_HOME/bin/spark-submit --class 
org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 spark-internal $@
{code}
Then user may call this script like:
{code}
./sbin/start-thriftserver.sh --master spark://some-host:7077 --hiveconf 
key=value
{code}
Notice that all options are captured by {{$@}}. If we put it before 
{{spark-internal}}, they are all recognized as {{spark-submit}} options, thus 
{{--hiveconf}} won't be passed to {{HiveThriftServer2}}; if we put it after 
{{spark-internal}}, they *should* all be recognized as options of 
{{HiveThriftServer2}}, but because of this bug, {{--master}} is still 
recognized as {{spark-submit}} option and leads to the right behavior.

Although currently all scripts using {{spark-submit}} work correctly, we still 
should fix this bug, because it causes option name collision between 
{{spark-submit}} and user application, and every time we add a new option to 
{{spark-submit}}, some existing user applications may break. However, solving 
this bug may cause some incompatible changes.

The suggested solution here is using {{--}} as separator of {{spark-submit}} 
options and user application options. For the Hive Thrift server example above, 
user should call it in this way:
{code}
./sbin/start-thriftserver.sh --master spark://some-host:7077 -- --hiveconf 
key=value
{code}
And {{SparkSubmitArguments}} should be responsible for splitting two sets of 
options and pass them correctly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to