[ https://issues.apache.org/jira/browse/SPARK-25004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17062333#comment-17062333 ]
Xiaochen Ouyang commented on SPARK-25004: ----------------------------------------- [~rdblue] This configuration can only control the worker.py process, and the maximum memory limit of the derived child process cannot be controlled. Worker(JVM) --> Executor–> python.demon–>python.demon , the last python demon process can not be controlled by this configuration. > Add spark.executor.pyspark.memory config to set resource.RLIMIT_AS > ------------------------------------------------------------------ > > Key: SPARK-25004 > URL: https://issues.apache.org/jira/browse/SPARK-25004 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 2.3.0 > Reporter: Ryan Blue > Assignee: Ryan Blue > Priority: Major > Fix For: 2.4.0 > > > Some platforms support limiting Python's addressable memory space by limiting > [{{resource.RLIMIT_AS}}|https://docs.python.org/3/library/resource.html#resource.RLIMIT_AS]. > We've found that adding a limit is very useful when running in YARN because > when Python doesn't know about memory constraints, it doesn't know when to > garbage collect and will continue using memory when it doesn't need to. > Adding a limit reduces PySpark memory consumption and avoids YARN killing > containers because Python hasn't cleaned up memory. > This also improves error messages for users, allowing them to see when Python > is allocating too much memory instead of YARN killing the container: > {code:lang=python} > File "build/bdist.linux-x86_64/egg/package/library.py", line 265, in > fe_engineer > fe_eval_rec.update(f(src_rec_prep, mat_rec_prep)) > File "build/bdist.linux-x86_64/egg/package/library.py", line 163, in fe_comp > comparisons = EvaluationUtils.leven_list_compare(src_rec_prep.get(item, > []), mat_rec_prep.get(item, [])) > File "build/bdist.linux-x86_64/egg/package/evaluationutils.py", line 25, in > leven_list_compare > permutations = sorted(permutations, reverse=True) > MemoryError > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org