Re: [PR] feat: Lance schema evolution add column support [hudi]

via GitHub Sat, 17 Jan 2026 10:52:25 -0800


voonhous commented on code in PR #17904:
URL: https://github.com/apache/hudi/pull/17904#discussion_r2701351544



##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/execution/datasources/lance/SparkLanceReaderBase.scala:
##########
@@ -84,9 +85,21 @@ class SparkLanceReaderBase(enableVectorizedReader: Boolean) 
extends SparkColumna
         // Open Lance file reader
         val lanceReader = LanceFileReader.open(filePath, allocator)
 
-        // Extract column names from required schema for projection
-        val columnNames: java.util.List[String] = if (requiredSchema.nonEmpty) 
{
-          requiredSchema.fields.map(_.name).toList.asJava
+        // Get schema from Lance file
+        val arrowSchema = lanceReader.schema()
+        val fileSchema = LanceArrowUtils.fromArrowSchema(arrowSchema)
+
+        // Create lance schema evolution helper
+        val evolution = new LanceBasicSchemaEvolution(
+          fileSchema,
+          requiredSchema,

Review Comment:
   Just a NIT, is there a difference between `requestSchema` and 
`requiredSchema`? We should keep the nomenclature the same. I traded the schema 
code 2 years ago and the number of `*Schema`s variables got me seeing stars. 
   
   Although there's LLMs that can help us read through and breakdown code now, 
it's still good practice to not rename variables that mean the same thing and 
keep nomenclature the same for the entire codebase as much as possible.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] feat: Lance schema evolution add column support [hudi]

Reply via email to