wombatu-kun commented on code in PR #18403:
URL: https://github.com/apache/hudi/pull/18403#discussion_r3187027470
##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/HoodieFileGroupReaderBasedFileFormat.scala:
##########
@@ -133,6 +134,24 @@ class HoodieFileGroupReaderBasedFileFormat(tablePath:
String,
}
}
+ /**
+ * Whether the requested schema contains any top-level BLOB columns. Used to
disable
+ * Lance batch mode for BLOB tables: the DESCRIPTOR-mode rewrite (and the
OUT_OF_LINE
+ * data→null contract) lives only in the row-path BlobDescriptorTransform,
and `supportBatch`
+ * cannot inspect read-time options (e.g. `hoodie.read.blob.inline.mode`)
since it runs at
+ * planning time. Forcing row mode whenever BLOB columns are present is the
simplest correct
+ * gate — BLOB processing is per-row anyway (lazy byte materialization) so
the perf delta is
+ * negligible.
+ */
+ private def schemaContainsBlobColumn(schema: StructType): Boolean = {
+ schema.fields.exists { f =>
+ val md = f.metadata
+ md != null && md.contains(HoodieSchema.TYPE_METADATA_FIELD) &&
+
HoodieSchema.parseTypeDescriptor(md.getString(HoodieSchema.TYPE_METADATA_FIELD))
+ .getType == HoodieSchemaType.BLOB
+ }
+ }
+
/**
* Checks if the file format supports vectorized reading, please refer to
SPARK-40918.
*
Review Comment:
Done in fe8f7e1c2a56. `lanceBatchSupported =
!schemaContainsBlobColumn(schema) && !internalSchemaOpt.isPresent`. The doc
comment above is rewritten to call out both triggers (DESCRIPTOR blob mode and
implicit type changes via internal-schema evolution) and explain that Spark’s
`ColumnarToRowExec` would `ClassCastException` on the row-path iterator if
either runtime fallback fired after the planner had committed to columnar
output.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]