Ankur Dave created SPARK-45616:
----------------------------------
Summary: Usages of ParVector are unsafe because it does not
propagate ThreadLocals or SparkSession
Key: SPARK-45616
URL: https://issues.apache.org/jira/browse/SPARK-45616
Project: Spark
Issue Type: Bug
Components: Spark Core, SQL, Tests
Affects Versions: 3.5.0
Reporter: Ankur Dave
Assignee: Ankur Dave
CastSuiteBase and ExpressionInfoSuite use ParVector.foreach() to run Spark SQL
queries in parallel. They incorrectly assume that each parallel operation will
inherit the main thread’s active SparkSession. This is only true when these
parallel operations run in freshly-created threads. However, when other code
has already run some parallel operations before Spark was started, then there
may be existing threads that do not have an active SparkSession. In that case,
these tests fail with NullPointerExceptions when creating SparkPlans or running
SQL queries.
The fix is to use the existing method ThreadUtils.parmap(). This method creates
fresh threads that inherit the current active SparkSession, and it propagates
the Spark ThreadLocals.
We should also add a scalastyle warning against use of ParVector.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]