hudi-agent commented on code in PR #18403:
URL: https://github.com/apache/hudi/pull/18403#discussion_r3186279309


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/HoodieFileGroupReaderBasedFileFormat.scala:
##########
@@ -133,6 +134,24 @@ class HoodieFileGroupReaderBasedFileFormat(tablePath: 
String,
     }
   }
 
+  /**
+   * Whether the requested schema contains any top-level BLOB columns. Used to 
disable
+   * Lance batch mode for BLOB tables: the DESCRIPTOR-mode rewrite (and the 
OUT_OF_LINE
+   * dataโ†’null contract) lives only in the row-path BlobDescriptorTransform, 
and `supportBatch`
+   * cannot inspect read-time options (e.g. `hoodie.read.blob.inline.mode`) 
since it runs at
+   * planning time. Forcing row mode whenever BLOB columns are present is the 
simplest correct
+   * gate โ€” BLOB processing is per-row anyway (lazy byte materialization) so 
the perf delta is
+   * negligible.
+   */
+  private def schemaContainsBlobColumn(schema: StructType): Boolean = {
+    schema.fields.exists { f =>
+      val md = f.metadata
+      md != null && md.contains(HoodieSchema.TYPE_METADATA_FIELD) &&
+        
HoodieSchema.parseTypeDescriptor(md.getString(HoodieSchema.TYPE_METADATA_FIELD))
+          .getType == HoodieSchemaType.BLOB
+    }
+  }
+
   /**
    * Checks if the file format supports vectorized reading, please refer to 
SPARK-40918.
    *

Review Comment:
   ๐Ÿค– [Line 160] Nice fix for the BLOB case โ€” the comment here ("so the planner 
doesn't commit to ColumnarBatch output that the row-path reader cannot 
deliver") describes exactly the same risk that exists for the `hasTypeChanges` 
runtime fallback in `SparkLanceReaderBase.read`. When a COW Lance table has 
implicit schema evolution, `supportBatch` still returns true here (no BLOB), 
but at runtime any file with type changes returns `Iterator[InternalRow]` while 
Spark already expects `ColumnarBatch`. Worth extending this gate (e.g. also 
disable when `internalSchemaOpt.isPresent`) to close that path too?
   
   <sub><i>- AI-generated; verify before applying. React ๐Ÿ‘/๐Ÿ‘Ž to flag 
quality.</i></sub>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to