Harsh Panchal created SPARK-50994:
-------------------------------------
Summary: Track RDD conversion under execution context
Key: SPARK-50994
URL: https://issues.apache.org/jira/browse/SPARK-50994
Project: Spark
Issue Type: Bug
Components: Spark Core
Affects Versions: 3.2.1, 4.1.0
Reporter: Harsh Panchal
When `Dataset` is converted into `RDD`, It executes `SpakPlan` without any
execution context. This leads to: # No tracking is available on Spark UI for
stages which are necessary to build the `RDD`.
# Spark properties which are local to thread may not be set in the `RDD`
execution context. This leads to these properties not being sent with
`TaskContext` but some operations like reading parquet files depend on these
properties (eg, case-sesitivity).
To resolve #2, workaround is to use `SparkContext.setLocalProperty` to
maunually set the requires properties. Permanent solution is to run rdd
generation in a separate execution context where both of these issues are
resolved.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]