Hi, Sorry If I am being noisy, but I wanted to grab your attention at SPARK-50994 <https://issues.apache.org/jira/browse/SPARK-50994>. It was raised because when `Dataset` is converted into `RDD`, It executes `SpakPlan` without any execution context. This leads to:
1. No tracking is available on Spark UI for stages which are necessary to build the `RDD`. 2. Spark properties which are local to thread may not be set in the `RDD` execution context. This leads to these properties not being sent with `TaskContext` but some operations like reading parquet files depend on these properties (eg, case-sensitivity). #2 can lead to data Correctness issues. See the testcase added in the PR <https://github.com/apache/spark/pull/49678>, current version provides incorrect values for dedup operation. I also feel that #1 is also useful since operations performed before RDD conversion are not traceable on Spark UI ATM. Thanks.