[
https://issues.apache.org/jira/browse/HIVE-16484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16317066#comment-16317066
]
Sahil Takiar commented on HIVE-16484:
-------------------------------------
[~xuefuz] thanks for voicing your concern, I see a few benefits to doing this:
* The main benefit is the usage of {{InProcessLauncher}} which was added in
SPARK-11035
** I didn't add the integration with {{InProcessLauncher}} to this patch mainly
because I didn't want the diff to get too big; I plan to add integration with
{{InProcessLauncher}} in another JIRA
** The {{InProcessLauncher}} avoids running {{bin/spark-submit}}, it calls
{{SparkSubmit#main}} directly, which decreases the amount of time it takes to
start a HoS session; a separate process doesn't need to be launched to start
the Spark app
** It also makes HoS easier to debug because everything is run in a single
process, we don't have to rely on re-directing stdout / stderr output streams,
etc.
* The API is much cleaner than building up command line arguments for
{{bin/spark-submit}}
Some other thoughts:
{quote} Moreover, security related stuff will need more testing at least.
{quote} I'm not that familiar with the security aspects of HoS, but I can add
some tests with {{MiniHiveKdc}} / doAs to check if things are still good.
{quote} I'd feel nervous in completely different code path which is so critical
{quote} Valid point, but the code path isn't that different, at the end of the
day everything is going through {{SparkSubmit.scala}}.
{quote} we can make a switch in later releases {quote} I don't think we have
plans to release Hive 3.0.0 anytime soon, so we can fix any issues with
{{SparkLauncher}} before the release.
Let me know your thoughts.
> Investigate SparkLauncher for HoS as alternative to bin/spark-submit
> --------------------------------------------------------------------
>
> Key: HIVE-16484
> URL: https://issues.apache.org/jira/browse/HIVE-16484
> Project: Hive
> Issue Type: Bug
> Components: Spark
> Reporter: Sahil Takiar
> Assignee: Sahil Takiar
> Attachments: HIVE-16484.1.patch, HIVE-16484.10.patch,
> HIVE-16484.2.patch, HIVE-16484.3.patch, HIVE-16484.4.patch,
> HIVE-16484.5.patch, HIVE-16484.6.patch, HIVE-16484.7.patch,
> HIVE-16484.8.patch, HIVE-16484.9.patch
>
>
> The {{SparkClientImpl#startDriver}} currently looks for the {{SPARK_HOME}}
> directory and invokes the {{bin/spark-submit}} script, which spawns a
> separate process to run the Spark application.
> {{SparkLauncher}} was added in SPARK-4924 and is a programatic way to launch
> Spark applications.
> I see a few advantages:
> * No need to spawn a separate process to launch a HoS --> lower startup time
> * Simplifies the code in {{SparkClientImpl}} --> easier to debug
> * {{SparkLauncher#startApplication}} returns a {{SparkAppHandle}} which
> contains some useful utilities for querying the state of the Spark job
> ** It also allows the launcher to specify a list of job listeners
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)