I'm definitely only talking about non-embedded uses here as I also use
embedded Spark (cassandra, and kafka) to run tests. This is almost always
safe since everything is in the same JVM. It's only once we get to
launching against a real distributed env do we end up with issues.

Since Pyspark uses spark submit in the java gateway i'm not sure if that
matters :)

The cases I see are usually usually going through main directly, adding
jars programatically.

Usually ends up with classpath errors (Spark not on the CP, their jar not
on the CP, dependencies not on the cp),
conf errors (executors have the incorrect environment, executor classpath
broken, not understanding spark-defaults won't do anything),
Jar version mismatches
Etc ...

On Mon, Oct 10, 2016 at 10:05 AM Sean Owen <so...@cloudera.com> wrote:

> I have also 'embedded' a Spark driver without much trouble. It isn't that
> it can't work.
>
> The Launcher API is ptobably the recommended way to do that though.
> spark-submit is the way to go for non programmatic access.
>
> If you're not doing one of those things and it is not working, yeah I
> think people would tell you you're on your own. I think that's consistent
> with all the JIRA discussions I have seen over time.
>
>
> On Mon, Oct 10, 2016, 17:33 Russell Spitzer <russell.spit...@gmail.com>
> wrote:
>
> I've seen a variety of users attempting to work around using Spark Submit
> with at best middling levels of success. I think it would be helpful if the
> project had a clear statement that submitting an application without using
> Spark Submit is truly for experts only or is unsupported entirely.
>
> I know this is a pretty strong stance and other people have had different
> experiences than me so please let me know what you think :)
>
>

Reply via email to