rahil-c commented on code in PR #17904:
URL: https://github.com/apache/hudi/pull/17904#discussion_r2761067889
##########
hudi-client/hudi-spark-client/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/SparkBasicSchemaEvolution.scala:
##########
@@ -20,32 +20,118 @@
package org.apache.spark.sql.execution.datasources.parquet
import org.apache.hudi.SparkAdapterSupport.sparkAdapter
-
+import org.apache.hudi.common.model.HoodieFileFormat
import org.apache.spark.sql.HoodieSchemaUtils
import org.apache.spark.sql.catalyst.expressions.UnsafeProjection
-import org.apache.spark.sql.types.StructType
+import org.apache.spark.sql.execution.datasources.SparkSchemaTransformUtils
+import org.apache.spark.sql.types.{ArrayType, DataType, MapType, StructField,
StructType}
/**
- * Intended to be used just with HoodieSparkParquetReader to avoid any
java/scala issues
+ * Generic schema evolution handler for different file formats.
+ * Supports Parquet (default), and Lance currently.
Review Comment:
Have pushed recent change here for chaining idea for separating null
projection here
https://github.com/apache/hudi/pull/17904/changes/5196402c7466be0f1dd7de66b6aa653cf44f2e09
I synced with tim and the main feedback is to see if we can further improve
to avoid doing any case switches related to the file format in the
`SparkBasicSchemaEvolution` and other schema related classes. Ideally we should
try to see if we can move any file format specific stuff in the respective
callers (in this case the our file format reader related classes).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]