Harsh Panchal created SPARK-50994:
-------------------------------------

             Summary: Track RDD conversion under execution context
                 Key: SPARK-50994
                 URL: https://issues.apache.org/jira/browse/SPARK-50994
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 3.2.1, 4.1.0
            Reporter: Harsh Panchal


When `Dataset` is converted into `RDD`, It executes `SpakPlan` without any 
execution context. This leads to: # No tracking is available on Spark UI for 
stages which are necessary to build the `RDD`.
 # Spark properties which are local to thread may not be set in the `RDD` 
execution context. This leads to these properties not being sent with 
`TaskContext` but some operations like reading parquet files depend on these 
properties (eg, case-sesitivity).



To resolve #2, workaround is to use `SparkContext.setLocalProperty` to 
maunually set the requires properties. Permanent solution is to run rdd 
generation in a separate execution context where both of these issues are 
resolved.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to