[ 
https://issues.apache.org/jira/browse/SPARK-17387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15475242#comment-15475242
 ] 

Bryan Cutler commented on SPARK-17387:
--------------------------------------

[~vanzin] you said if you use PySpark you could get the correct "4g" memory 
option, but I was only able to do this through the command line like this

{noformat}
$> bin/pyspark --conf spark.driver.memory=4g
{noformat}

Is that what you meant?  I think just adding the command line confs to 
configure the JVM through a plain Python shell would be a simple fix, and still 
be inline with how the Scala spark-shell works too.

> Creating SparkContext() from python without spark-submit ignores user conf
> --------------------------------------------------------------------------
>
>                 Key: SPARK-17387
>                 URL: https://issues.apache.org/jira/browse/SPARK-17387
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 2.0.0
>            Reporter: Marcelo Vanzin
>            Priority: Minor
>
> Consider the following scenario: user runs a python application not through 
> spark-submit, but by adding the pyspark module and manually creating a Spark 
> context. Kinda like this:
> {noformat}
> $ SPARK_HOME=$PWD PYTHONPATH=python:python/lib/py4j-0.10.3-src.zip python
> Python 2.7.12 (default, Jul  1 2016, 15:12:24) 
> [GCC 5.4.0 20160609] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> >>> from pyspark import SparkContext
> >>> from pyspark import SparkConf
> >>> conf = SparkConf().set("spark.driver.memory", "4g")
> >>> sc = SparkContext(conf=conf)
> {noformat}
> If you look at the JVM launched by the pyspark code, it ignores the user's 
> configuration:
> {noformat}
> $ ps ax | grep $(pgrep -f SparkSubmit)
> 12283 pts/2    Sl+    0:03 /apps/java7/bin/java -cp ... -Xmx1g 
> -XX:MaxPermSize=256m org.apache.spark.deploy.SparkSubmit pyspark-shell
> {noformat}
> Note the "1g" of memory. If instead you use "pyspark", you get the correct 
> "4g" in the JVM.
> This also affects other configs; for example, you can't really add jars to 
> the driver's classpath using "spark.jars".
> You can work around this by setting the undocumented env variable Spark 
> itself uses:
> {noformat}
> $ SPARK_HOME=$PWD PYTHONPATH=python:python/lib/py4j-0.10.3-src.zip python
> Python 2.7.12 (default, Jul  1 2016, 15:12:24) 
> [GCC 5.4.0 20160609] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import os
> >>> os.environ['PYSPARK_SUBMIT_ARGS'] = "pyspark-shell --conf 
> >>> spark.driver.memory=4g"
> >>> from pyspark import SparkContext
> >>> sc = SparkContext()
> {noformat}
> But it would be nicer if the configs were automatically propagated.
> BTW the reason for this is that the {{launch_gateway}} function used to start 
> the JVM does not take any parameters, and the only place where it reads 
> arguments for Spark is that env variable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to