[
https://issues.apache.org/jira/browse/SPARK-34115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Norbert Schultz updated SPARK-34115:
------------------------------------
Component/s: SQL
> Long runtime on many environment variables
> ------------------------------------------
>
> Key: SPARK-34115
> URL: https://issues.apache.org/jira/browse/SPARK-34115
> Project: Spark
> Issue Type: Bug
> Components: Spark Core, SQL
> Affects Versions: 2.4.0, 2.4.7
> Environment: Spark 2.4.0 local[2] on a Kubernetes Pod
> Reporter: Norbert Schultz
> Priority: Major
> Attachments: spark-bug-34115.tar.gz
>
>
> I am not sure if this is a bug report or a feature request. The code is is
> the same in current versions of Spark and maybe this ticket saves someone
> some time for debugging.
> We migrated some older code to Spark 2.4.0, and suddently the integration
> tests on our build machine were much slower than expected.
> On local machines it was running perfectly.
> At the end it turned out, that Spark was wasting CPU Cycles during DataFrame
> analyzing in the following functions
> * AnalysisHelper.assertNotAnalysisRule calling
> * Utils.isTesting
> Utils.isTesting is traversing all environment variables.
> The offending build machine was a Kubernetes Pod which automatically exposed
> all services as environment variables, so it had more than 3000 environment
> variables.
> As Utils.isTesting is called very often throgh
> AnalysisHelper.assertNotAnalysisRule (via AnalysisHelper.transformDown,
> transformUp).
>
> Of course we will restrict the number of environment variables, on the other
> side Utils.isTesting could also use a lazy val for
>
> {code:java}
> sys.env.contains("SPARK_TESTING") {code}
>
> to not make it that expensive.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]