hudi-agent commented on code in PR #18665:
URL: https://github.com/apache/hudi/pull/18665#discussion_r3245404446
##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala:
##########
@@ -356,11 +357,15 @@ class HoodieSparkSqlWriterInternal {
val shouldReconcileSchema =
parameters(DataSourceWriteOptions.RECONCILE_SCHEMA.key()).toBoolean
val latestTableSchemaOpt = getLatestTableSchema(tableMetaClient,
schemaFromCatalog)
- val df = if (preppedWriteOperation || preppedSparkSqlWrites ||
preppedSparkSqlMergeInto || sourceDf.isStreaming) {
+ val dfPreBlobPad = if (preppedWriteOperation || preppedSparkSqlWrites ||
preppedSparkSqlMergeInto || sourceDf.isStreaming) {
Review Comment:
🤖 nit: `dfPreBlobPad` names the variable by its position in the pipeline
rather than by what it holds. Something like `dfWithoutMetaCols` or
`sourceDfStripped` would communicate the semantic purpose more directly to a
reader who hasn't just read the adjacent lines.
<sub><i>- AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
##########
hudi-client/hudi-spark-client/src/main/scala/org/apache/spark/sql/avro/HoodieSparkSchemaConverters.scala:
##########
@@ -409,6 +583,13 @@ object HoodieSparkSchemaConverters extends
SparkAdapterSupport {
private lazy val expectedBlobStructType: StructType =
toSqlType(HoodieSchema.createBlob())._1.asInstanceOf[StructType]
+ // Spark type of the canonical reference sub-struct ({external_path, offset,
length, managed}).
+ // Used by padPartialBlobColumns to construct lit(null).cast(...) for the
missing reference
+ // field when a user supplies an INLINE-only `{type, data}` blob struct.
+ private lazy val expectedBlobReferenceStructType: DataType =
+ expectedBlobStructType.fields
+ .find(_.name == HoodieSchema.Blob.EXTERNAL_REFERENCE).get.dataType
Review Comment:
🤖 nit: `.find(...).get` in a lazy val will surface as a bare
NoSuchElementException with no context if EXTERNAL_REFERENCE is ever absent
from expectedBlobStructType. Using `.getOrElse(throw new
IllegalStateException("Missing EXTERNAL_REFERENCE in canonical blob struct"))`
would give an actionable error if the schema ever changes.
<sub><i>- AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]