[jira] [Updated] (SPARK-19369) SparkConf not getting properly initialized in PySpark 2.1.0

Sidney Feiner (JIRA) Thu, 26 Jan 2017 07:49:47 -0800

     [ 
https://issues.apache.org/jira/browse/SPARK-19369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sidney Feiner updated SPARK-19369:
----------------------------------
    Description: 
Trying to migrate from Spark 1.6 to 2.1, I've stumbled upon a small problem - 
my SparkContext doesn't get its configurations from the SparkConf object. 
Before passing them onto to the SparkContext constructor, I've made sure my 
configuration are set.

I've done some digging and this is what I've found:

When I initialize the SparkContext, the following code is executed:

def _do_init(self, master, appName, sparkHome, pyFiles, environment, batchSize, 
serializer,
             conf, jsc, profiler_cls):
    self.environment = environment or {}
    if conf is not None and conf._jconf is not None:
       self._conf = conf
    else:
        self._conf = SparkConf(_jvm=SparkContext._jvm)


So I can see that the only way that my SparkConf will be used is if it also has 
a _jvm object.
I've used spark-submit to submit my job and printed the _jvm object but it is 
null, which explains why my SparkConf object is ignored.
I've tried running exactly the same on Spark 2.0.1 and it worked! My SparkConf 
object had a valid _jvm object.

Am i doing something wrong or is this a bug?

  was:
Trying to migrate from Spark 1.6 to 2.1, I've stumbled upon a small problem - 
my SparkContext doesn't get its configurations from the SparkConf object. 
Before passing them onto to the SparkContext constructor, I've made sure my 
configuration are set.

I've done some digging and this is what I've found:

When I initialize the SparkContext, the following code is executed:

def _do_init(self, master, appName, sparkHome, pyFiles, environment, batchSize, 
serializer,
             conf, jsc, profiler_cls):
    self.environment = environment or {}
    # java gateway must have been launched at this point.
    if conf is not None and conf._jconf is not None:
        # conf has been initialized in JVM properly, so use conf directly. This 
represent the
        # scenario that JVM has been launched before SparkConf is created (e.g. 
SparkContext is
        # created and then stopped, and we create a new SparkConf and new 
SparkContext again)
       self._conf = conf
    else:
        self._conf = SparkConf(_jvm=SparkContext._jvm)


So I can see that the only way that my SparkConf will be used is if it also has 
a _jvm object.
I've used spark-submit to submit my job and printed the _jvm object but it is 
null, which explains why my SparkConf object is ignored.
I've tried running exactly the same on Spark 2.0.1 and it worked! My SparkConf 
object had a valid _jvm object.

Am i doing something wrong or is this a bug?


> SparkConf not getting properly initialized in PySpark 2.1.0
> -----------------------------------------------------------
>
>                 Key: SPARK-19369
>                 URL: https://issues.apache.org/jira/browse/SPARK-19369
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 2.1.0
>         Environment: Windows/Linux
>            Reporter: Sidney Feiner
>              Labels: configurations, context, pyspark
>
> Trying to migrate from Spark 1.6 to 2.1, I've stumbled upon a small problem - 
> my SparkContext doesn't get its configurations from the SparkConf object. 
> Before passing them onto to the SparkContext constructor, I've made sure my 
> configuration are set.
> I've done some digging and this is what I've found:
> When I initialize the SparkContext, the following code is executed:
> def _do_init(self, master, appName, sparkHome, pyFiles, environment, 
> batchSize, serializer,
>              conf, jsc, profiler_cls):
>     self.environment = environment or {}
>     if conf is not None and conf._jconf is not None:
>        self._conf = conf
>     else:
>         self._conf = SparkConf(_jvm=SparkContext._jvm)
> So I can see that the only way that my SparkConf will be used is if it also 
> has a _jvm object.
> I've used spark-submit to submit my job and printed the _jvm object but it is 
> null, which explains why my SparkConf object is ignored.
> I've tried running exactly the same on Spark 2.0.1 and it worked! My 
> SparkConf object had a valid _jvm object.
> Am i doing something wrong or is this a bug?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-19369) SparkConf not getting properly initialized in PySpark 2.1.0

Reply via email to