yihua commented on PR #8082: URL: https://github.com/apache/hudi/pull/8082#issuecomment-1535453367
If we'd like to have this fix in 0.13.1 release without introducing performance problems for existing Spark versions, could we consider the following to triage the scope of impact? (1) Could we disable the optimization rule of nested schema pruning for Spark 3.3.2 only, and see if the tests can pass (without config change of vectorized reader)? This is done by not adding `org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning` for Spark 3.3.2 in [HoodieAnalysis](https://github.com/apache/hudi/blob/master/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/analysis/HoodieAnalysis.scala#L132). (2) If the above does not work, could we disable the vectorized reader for Spark 3.3.2 only? And still use Spark 3.3.1 as the compile dependency in this case? (3) Could we also list all the failed tests and see what are in common for further investigation? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
