Ryan Blue created SPARK-28843: --------------------------------- Summary: Set OMP_NUM_THREADS to executor cores reduce Python memory consumption Key: SPARK-28843 URL: https://issues.apache.org/jira/browse/SPARK-28843 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 2.4.3, 2.3.3, 3.0.0 Reporter: Ryan Blue
While testing hardware with more cores, we found that the amount of memory required by PySpark applications increased and tracked the problem to importing numpy. The numpy issue isĀ [https://github.com/numpy/numpy/issues/10455] NumPy uses OpenMP that starts a thread pool with the number of cores on the machine (and does not respect cgroups). When we set this lower we see a reduction in memory consumption. This parallelism setting should be set to the number of cores allocated to the executor, not the number of cores available. -- This message was sent by Atlassian Jira (v8.3.2#803003) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org