[ 
https://issues.apache.org/jira/browse/SPARK-25004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17062333#comment-17062333
 ] 

Xiaochen Ouyang commented on SPARK-25004:
-----------------------------------------

[~rdblue] This configuration can only control the worker.py process, and the 
maximum memory limit of the derived child process cannot be controlled. 

Worker(JVM) --> Executor–> python.demon–>python.demon , the last python demon 
process can not be controlled by this configuration.

> Add spark.executor.pyspark.memory config to set resource.RLIMIT_AS
> ------------------------------------------------------------------
>
>                 Key: SPARK-25004
>                 URL: https://issues.apache.org/jira/browse/SPARK-25004
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 2.3.0
>            Reporter: Ryan Blue
>            Assignee: Ryan Blue
>            Priority: Major
>             Fix For: 2.4.0
>
>
> Some platforms support limiting Python's addressable memory space by limiting 
> [{{resource.RLIMIT_AS}}|https://docs.python.org/3/library/resource.html#resource.RLIMIT_AS].
> We've found that adding a limit is very useful when running in YARN because 
> when Python doesn't know about memory constraints, it doesn't know when to 
> garbage collect and will continue using memory when it doesn't need to. 
> Adding a limit reduces PySpark memory consumption and avoids YARN killing 
> containers because Python hasn't cleaned up memory.
> This also improves error messages for users, allowing them to see when Python 
> is allocating too much memory instead of YARN killing the container:
> {code:lang=python}
>   File "build/bdist.linux-x86_64/egg/package/library.py", line 265, in 
> fe_engineer
>     fe_eval_rec.update(f(src_rec_prep, mat_rec_prep))
>   File "build/bdist.linux-x86_64/egg/package/library.py", line 163, in fe_comp
>     comparisons = EvaluationUtils.leven_list_compare(src_rec_prep.get(item, 
> []), mat_rec_prep.get(item, []))
>   File "build/bdist.linux-x86_64/egg/package/evaluationutils.py", line 25, in 
> leven_list_compare
>     permutations = sorted(permutations, reverse=True)
>   MemoryError
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to