[ 
https://issues.apache.org/jira/browse/SPARK-25004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Blue updated SPARK-25004:
------------------------------
    Description: 
Some platforms support limiting Python's addressable memory space by limiting 
[{{resource.RLIMIT_AS}}|https://docs.python.org/3/library/resource.html#resource.RLIMIT_AS].

We've found that adding a limit is very useful when running in YARN because 
when Python doesn't know about memory constraints, it doesn't know when to 
garbage collect and will continue using memory when it doesn't need to. Adding 
a limit reduces PySpark memory consumption and avoids YARN killing containers 
because Python hasn't cleaned up memory.

This also improves error messages for users, allowing them to see when Python 
is allocating too much memory instead of YARN killing the container:

{code:lang=python}
  File "build/bdist.linux-x86_64/egg/package/library.py", line 265, in 
fe_engineer
    fe_eval_rec.update(f(src_rec_prep, mat_rec_prep))
  File "build/bdist.linux-x86_64/egg/package/library.py", line 163, in fe_comp
    comparisons = EvaluationUtils.leven_list_compare(src_rec_prep.get(item, 
[]), mat_rec_prep.get(item, []))
  File "build/bdist.linux-x86_64/egg/package/evaluationutils.py", line 25, in 
leven_list_compare
    permutations = sorted(permutations, reverse=True)
  MemoryError
{code}

  was:
Some platforms support limiting Python's addressable memory space by limiting 
[{{resource.RLIMIT_AS}}|https://docs.python.org/3/library/resource.html#resource.RLIMIT_AS].

We've found that adding a limit is very useful when running in YARN because 
when Python doesn't know about memory constraints, it doesn't know when to 
garbage collect and will continue using memory when it doesn't need to. Adding 
a limit reduces PySpark memory consumption and avoids YARN killing containers 
because Python hasn't cleaned up memory.

This also improves error messages for users, allowing them to see when Python 
is allocating too much memory instead of YARN killing the container:

{code:lang=python}
  File "build/bdist.linux-x86_64/egg/package/library.py", line 265, in 
fe_engineer
    fe_eval_rec.update(f(src_rec_prep, mat_rec_prep))
  File "build/bdist.linux-x86_64/egg/package/library.py", line 163, in fe_comp
    comparisons = EvaluationUtils.leven_list_compare(src_rec_prep.get(item, 
[]), mat_rec_prep.get(item, []))
  File "build/bdist.linux-x86_64/egg/package/evaluationutils.py", line 25, in 
leven_list_compare
    permutations = sorted(permutations, reverse=True)
MemoryError
{code}


> Add spark.executor.pyspark.memory config to set resource.RLIMIT_AS
> ------------------------------------------------------------------
>
>                 Key: SPARK-25004
>                 URL: https://issues.apache.org/jira/browse/SPARK-25004
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 2.3.0
>            Reporter: Ryan Blue
>            Priority: Major
>
> Some platforms support limiting Python's addressable memory space by limiting 
> [{{resource.RLIMIT_AS}}|https://docs.python.org/3/library/resource.html#resource.RLIMIT_AS].
> We've found that adding a limit is very useful when running in YARN because 
> when Python doesn't know about memory constraints, it doesn't know when to 
> garbage collect and will continue using memory when it doesn't need to. 
> Adding a limit reduces PySpark memory consumption and avoids YARN killing 
> containers because Python hasn't cleaned up memory.
> This also improves error messages for users, allowing them to see when Python 
> is allocating too much memory instead of YARN killing the container:
> {code:lang=python}
>   File "build/bdist.linux-x86_64/egg/package/library.py", line 265, in 
> fe_engineer
>     fe_eval_rec.update(f(src_rec_prep, mat_rec_prep))
>   File "build/bdist.linux-x86_64/egg/package/library.py", line 163, in fe_comp
>     comparisons = EvaluationUtils.leven_list_compare(src_rec_prep.get(item, 
> []), mat_rec_prep.get(item, []))
>   File "build/bdist.linux-x86_64/egg/package/evaluationutils.py", line 25, in 
> leven_list_compare
>     permutations = sorted(permutations, reverse=True)
>   MemoryError
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to