rahil-c commented on code in PR #17904:
URL: https://github.com/apache/hudi/pull/17904#discussion_r2757198006


##########
hudi-client/hudi-spark-client/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/SparkBasicSchemaEvolution.scala:
##########
@@ -20,32 +20,118 @@
 package org.apache.spark.sql.execution.datasources.parquet
 
 import org.apache.hudi.SparkAdapterSupport.sparkAdapter
-
+import org.apache.hudi.common.model.HoodieFileFormat
 import org.apache.spark.sql.HoodieSchemaUtils
 import org.apache.spark.sql.catalyst.expressions.UnsafeProjection
-import org.apache.spark.sql.types.StructType
+import org.apache.spark.sql.execution.datasources.SparkSchemaTransformUtils
+import org.apache.spark.sql.types.{ArrayType, DataType, MapType, StructField, 
StructType}
 
 
 /**
- * Intended to be used just with HoodieSparkParquetReader to avoid any 
java/scala issues
+ * Generic schema evolution handler for different file formats.
+ * Supports Parquet (default), and Lance currently.

Review Comment:
   @the-other-tim-brown 
   Ack, I mentioned this response 
https://github.com/apache/hudi/pull/17904#discussion_r2748679238 requesting a 
little more clarification on how you would like the refactor to look like, 
since right now its not clear to me on where this null padding piece would be 
placed then. I am open to your suggestion, just trying to understand how it 
would look like.
   
   Assuming then we do not do any null padding in either top level fields or 
nested fields in the following areas:
   * 
https://github.com/apache/hudi/pull/17904/changes#diff-56d3b110e2b04263ed60368227bddd9bef085799f4917701f936cbc9f7f71572R77
   * 
https://github.com/apache/hudi/pull/17904/changes#diff-56d3b110e2b04263ed60368227bddd9bef085799f4917701f936cbc9f7f71572R125
   
   Then the UnsafeProjection we are currently returning in these function 
https://github.com/apache/hudi/pull/17904/changes#diff-56d3b110e2b04263ed60368227bddd9bef085799f4917701f936cbc9f7f71572R81
   would not be fully correct as it does not align with the evolved schema.
   
   So i am wondering if your idea then is before we apply this projection to 
the iterator of unsafe rows
   that we are modifying the existing projection or recreating a new projection 
with the null padding
   in the `SparkBasicSchemaEvolution`
   
https://github.com/apache/hudi/pull/17904/changes#diff-8ed98fba80253c795ae16cb143f54eba4cc9616774c85ce8eb4ad9a83f422863R127
   
   Or in the `SparkLanceReaderBase`
   
https://github.com/apache/hudi/pull/17904/changes#diff-bdccaaaeb061abdf550efec86661f9d3790c66d53e04b1ed2e9cf9a61ea06e13R135
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to