[
https://issues.apache.org/jira/browse/SPARK-27455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dongjoon Hyun updated SPARK-27455:
----------------------------------
Affects Version/s: (was: 2.4.1)
3.0.0
> spark-submit and friends should allow main artifact to be specified as a
> package
> --------------------------------------------------------------------------------
>
> Key: SPARK-27455
> URL: https://issues.apache.org/jira/browse/SPARK-27455
> Project: Spark
> Issue Type: Improvement
> Components: Spark Core
> Affects Versions: 3.0.0
> Reporter: Brian Lindblom
> Assignee: Brian Lindblom
> Priority: Minor
>
> Spark already has the ability to provide spark.jars.packages in order to
> include a set of required dependencies for an application. It will
> transitively resolve any provided packages via ivy, cache those artifacts,
> and serve them via the driver to launched executors. It would be useful to
> take this one step further and be able to allow a spark.jars.main.package and
> corresponding command line flag, --main-package, to eliminate the need to
> specify a specific jar file (which does NOT transitively resolve). This
> could simplify many use-cases. Additionally, --main-package can trigger the
> inspection of the artifact's meta-inf to determine the main class, obviating
> the need for spark-submit invocations to include this information directly.
> Currently, I've found that I can do
> {{spark-submit --packages com.example:my-package:1.0.0 --class
> com.example.MyPackage /path/to/mypackage-1.0.0.jar <my_args>}}
> to achieve the same effect. This additional boiler plate, however, seems
> unnecessary, especially considering one must fetch/orchestrate the jar into
> some location (local or remote) in addition to specifying any dependencies.
> Resorting to fat jars to simplify creates other issues.
> Ideally
> {{spark-submit --repository <url_to_my_repo> --main-package
> com.example:my-package:1.0.0 <my_args>}}
> would be all that is necessary to bootstrap an application. Obviously, care
> must be taken to avoid DoS'ing <url_to_my_repo> if orchestrating many Spark
> applications. In that case, it may also be desirable to implement a
> --repository-cache-uri <uri_to_repository_cache> where, perhaps in the case
> where an HDFS is available, we can bootstrap our application and subsequently
> cache the resolution to a larger artifact in HDFS for consumption later
> (zip/tar up the ivy cache itself)?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]