Would you mind to file a JIRA for this? Thanks!

Cheng

On 6/11/15 2:40 PM, Dong Lei wrote:

I think in standalone cluster mode, spark is supposed to do:

1.Download jars, files to driver

2.Set the driver’s class path

3.Driver setup a http file server to distribute these files

4.Worker download from driver and setup classpath

Right?

But somehow, the first step fails.

Even if I can make the first step works(use option1), it seems that the classpath in driver is not correctly set.

Thanks

Dong Lei

*From:*Cheng Lian [mailto:lian.cs....@gmail.com]
*Sent:* Thursday, June 11, 2015 2:32 PM
*To:* Dong Lei
*Cc:* Dianfei (Keith) Han; dev@spark.apache.org
*Subject:* Re: How to support dependency jars and files on HDFS in standalone cluster mode?

Oh sorry, I mistook --jars for --files. Yeah, for jars we need to add them to classpath, which is different from regular files.

Cheng

On 6/11/15 2:18 PM, Dong Lei wrote:

    Thanks Cheng,

    If I do not use --jars how can I tell spark to search the jars(and
    files) on HDFS?

    Do you mean the driver will not need to setup a HTTP file server
    for this scenario and the worker will fetch the jars and files
    from HDFS?

    Thanks

    Dong Lei

    *From:*Cheng Lian [mailto:lian.cs....@gmail.com]
    *Sent:* Thursday, June 11, 2015 12:50 PM
    *To:* Dong Lei; dev@spark.apache.org <mailto:dev@spark.apache.org>
    *Cc:* Dianfei (Keith) Han
    *Subject:* Re: How to support dependency jars and files on HDFS in
    standalone cluster mode?

    Since the jars are already on HDFS, you can access them directly
    in your Spark application without using --jars

    Cheng

    On 6/11/15 11:04 AM, Dong Lei wrote:

        Hi spark-dev:

        I can not use a hdfs location for the “--jars” or “--files”
        option when doing a spark-submit in a standalone cluster mode.
        For example:

                        Spark-submit  …  --jars hdfs://ip/1.jar  ….
         hdfs://ip/app.jar (standalone cluster mode)

        will not download 1.jar to driver’s http file server(but the
        app.jar will be downloaded to the driver’s dir).

        I figure out the reason spark not downloading the jars is that
        when doing sc.addJar to http file server, the function called
        is Files.copy which does not support a remote location.

        And I think if spark can download the jars and add them to
        http file server, the classpath is not correctly set, because
        the classpath contains remote location.

        So I’m trying to make it work and come up with two options,
        but neither of them seem to be elegant, and I want to hear
        your advices:

        Option 1:

        Modify HTTPFileServer.addFileToDir, let it recognize a “hdfs”
        prefix.

        This is not good because I think it breaks the scope of http
        file server.

        Option 2:

        Modify DriverRunner.downloadUserJar, let it download all the
        “--jars” and “--files” with the application jar.

        This sounds more reasonable that option 1 for downloading
        files. But this way I need to read the “spark.jars” and
        “spark.files” on downloadUserJar or DriverRunnder.start and
        replace it with a local path. How can I do that?

        Do you have a more elegant solution, or do we have a plan to
        support it in the furture?

        Thanks

        Dong Lei


Reply via email to