Re: [PR] feat(schema): Add support to write shredded variants for HoodieRecordType.SPARK [hudi]

via GitHub Sat, 04 Apr 2026 15:08:14 -0700


yihua commented on code in PR #18036:
URL: https://github.com/apache/hudi/pull/18036#discussion_r3036105069



##########
hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/util/HoodieSchemaConverter.java:
##########
@@ -554,8 +554,7 @@ private static DataType convertUnion(HoodieSchema schema) {
 
   /**
    * Converts a Variant schema to Flink's ROW type.
-   * Variant is represented as ROW<`metadata` BYTES, `value` BYTES> in Flink.
-   * // TODO: We are only supporting unshredded for now, support shredded in 
the future
+   * Variant is represented as ROW<`value` BYTES, `metadata` BYTES> in Flink.

Review Comment:
   <a href="#"><img alt="P2" 
src="https://greptile-static-assets.s3.amazonaws.com/badges/p2.svg?v=7"; 
align="top"></a> **Reversed field order in updated Javadoc**
   
   The updated comment says `ROW<value BYTES, metadata BYTES>` (value first), 
but the actual implementation at lines 569–572 creates the Flink ROW with 
`metadata` as the first field and `value` as the second:
   
   ```java
   return DataTypes.ROW(
       DataTypes.FIELD("metadata", DataTypes.BYTES().notNull()),
       DataTypes.FIELD("value", DataTypes.BYTES().notNull())
   ).notNull();
   ```
   
   The comment should read `ROW<metadata BYTES, value BYTES>` to match the 
code. The previous comment was actually correct about the field order (it only 
had the wrong intent about which field came first positionally).
   
   ```suggestion
      * Variant is represented as ROW<`metadata` BYTES, `value` BYTES> in Flink.
   ```
   
   — *Greptile* 
([original](https://github.com/yihua/hudi/pull/12#discussion_r3036097637)) 
(source:comment#3036097637)



##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/io/storage/row/HoodieRowParquetWriteSupport.java:
##########
@@ -391,6 +542,9 @@ private ValueWriter makeWriter(HoodieSchema schema, 
DataType dataType) {
           }
         });
       };
+    } else if (dataType instanceof StructType
+        && 
SparkAdapterSupport$.MODULE$.sparkAdapter().isVariantShreddingStruct((StructType)
 dataType)) {
+      return makeShreddedVariantWriter((StructType) dataType);

Review Comment:
   _⚠️ Potential issue_ | _🔴 Critical_
   
   **Preserve `HoodieSchema` when building shredded child writers.**
   
   Line 571 recreates every shredded child writer with `null` schema, so 
schema-only semantics are lost on the write path. A shredded Variant that 
contains a millis timestamp, for example, will still be written through the 
default micros branch here, while `convert(...)` derives the Parquet logical 
type from the real `HoodieSchema`. That gives you a schema/data mismatch for 
affected shredded variants.
   
   
   <details>
   <summary>Suggested fix</summary>
   
   ```diff
   -    } else if (dataType instanceof StructType
   -        && 
SparkAdapterSupport$.MODULE$.sparkAdapter().isVariantShreddingStruct((StructType)
 dataType)) {
   -      return makeShreddedVariantWriter((StructType) dataType);
   +    } else if (dataType instanceof StructType
   +        && 
SparkAdapterSupport$.MODULE$.sparkAdapter().isVariantShreddingStruct((StructType)
 dataType)) {
   +      return makeShreddedVariantWriter((StructType) dataType, 
resolvedSchema);
        } else if (dataType instanceof StructType) {
   ```
   
   ```diff
   -  private ValueWriter makeShreddedVariantWriter(StructType 
shreddedStructType) {
   -    // Create writers for the shredded struct fields
   -    // The shreddedStructType contains: metadata (binary), value (binary), 
typed_value (optional)
   -    ValueWriter[] shreddedFieldWriters = 
Arrays.stream(shreddedStructType.fields())
   -        .map(field -> makeWriter(null, field.dataType()))
   -        .toArray(ValueWriter[]::new);
   +  private ValueWriter makeShreddedVariantWriter(StructType 
shreddedStructType, HoodieSchema variantSchema) {
   +    ValueWriter[] shreddedFieldWriters = 
getFieldWriters(shreddedStructType, variantSchema);
   ```
   </details>
   
   
   Also applies to: 567-585
   
   <details>
   <summary>🤖 Prompt for AI Agents</summary>
   
   ```
   Verify each finding against the current code and only fix it if needed.
   
   In
   
`@hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/io/storage/row/HoodieRowParquetWriteSupport.java`
   around lines 545 - 547, HoodieSchema is being dropped when creating shredded
   child writers, causing schema-only semantics (e.g., millis vs micros 
timestamps)
   to be lost; update the calls that build shredded writers in
   HoodieRowParquetWriteSupport (notably makeShreddedVariantWriter and the other
   shredded-child factory calls in the same branch range) to pass the current
   HoodieSchema through to the writer factory (or add an overload that accepts
   HoodieSchema), and update those factory/signature implementations so
   convert(...) and any Parquet logical-type resolution use the provided
   HoodieSchema rather than null; apply the same schema propagation to all 
shredded
   writer creation sites in the 567-585 region so timestamps and other logical
   types are derived from the real HoodieSchema.
   ```
   
   </details>
   
   <!-- 
fingerprinting:phantom:medusa:grasshopper:d884afb5-4216-4148-afc5-ef93118769b9 
-->
   
   <!-- This is an auto-generated comment by CodeRabbit -->
   
   — *CodeRabbit* 
([original](https://github.com/yihua/hudi/pull/12#discussion_r3036104555)) 
(source:comment#3036104555)



##########
hudi-common/src/main/java/org/apache/hudi/common/schema/HoodieSchema.java:
##########
@@ -2463,6 +2569,60 @@ public Option<HoodieSchema> getTypedValueField() {
       return typedValueSchema;
     }
 
+    /**
+     * Returns the typed_value schema with plain (unwrapped) types suitable 
for Spark shredding utilities, i.e. essentially removing the `value` field
+     *
+     * <p>If the typed_value follows the variant shredding spec (each field is 
a struct with
+     * {@code {value: bytes, typed_value: <type>}}), this extracts only the 
inner typed_value types and returns a record schema containing just those plain 
types.</p>
+     *
+     * <p>If the typed_value is already in plain form (created with {@code 
createVariantShredded}),
+     * returns the schema as-is.</p>
+     *
+     * @return Option containing the plain typed_value schema, or 
Option.empty() if not present
+     */
+    public Option<HoodieSchema> getPlainTypedValueSchema() {
+      if (!typedValueSchema.isPresent()) {
+        return Option.empty();
+      }
+      HoodieSchema tvSchema = typedValueSchema.get();
+      if (tvSchema.getType() != HoodieSchemaType.RECORD) {
+        return typedValueSchema;
+      }
+
+      List<HoodieSchemaField> fields = tvSchema.getFields();
+      // Check if all fields follow the nested shredding pattern: each field 
is a record with {value, typed_value}
+      boolean isNestedForm = !fields.isEmpty() && 
fields.stream().allMatch(field -> {
+        HoodieSchema fieldSchema = field.schema();
+        if (fieldSchema.isNullable()) {
+          fieldSchema = fieldSchema.getNonNullType();
+        }
+        if (fieldSchema.getType() != HoodieSchemaType.RECORD) {
+          return false;
+        }
+        Option<HoodieSchemaField> valueSubField = 
fieldSchema.getField(VARIANT_VALUE_FIELD);
+        Option<HoodieSchemaField> typedValueSubField = 
fieldSchema.getField(VARIANT_TYPED_VALUE_FIELD);
+        return valueSubField.isPresent() && typedValueSubField.isPresent()
+            && fieldSchema.getFields().size() == 2;
+      });

Review Comment:
   _⚠️ Potential issue_ | _🟠 Major_
   
   **Tighten the wrapper detection before flattening `typed_value`.**
   
   `isNestedForm` only checks field names and field count. Any ordinary record 
whose fields happen to be `value` and `typed_value` will be unwrapped here and 
lose its `value` member. Check the actual shredded-wrapper contract instead of 
names alone.
   
   <details>
   <summary>♻️ Suggested tightening</summary>
   
   ```diff
   -      boolean isNestedForm = !fields.isEmpty() && 
fields.stream().allMatch(field -> {
   +      boolean isNestedForm = !fields.isEmpty() && 
fields.stream().allMatch(field -> {
            HoodieSchema fieldSchema = field.schema();
            if (fieldSchema.isNullable()) {
              fieldSchema = fieldSchema.getNonNullType();
            }
            if (fieldSchema.getType() != HoodieSchemaType.RECORD) {
              return false;
            }
            Option<HoodieSchemaField> valueSubField = 
fieldSchema.getField(VARIANT_VALUE_FIELD);
            Option<HoodieSchemaField> typedValueSubField = 
fieldSchema.getField(VARIANT_TYPED_VALUE_FIELD);
   -        return valueSubField.isPresent() && typedValueSubField.isPresent()
   -            && fieldSchema.getFields().size() == 2;
   +        return valueSubField.isPresent()
   +            && valueSubField.get().schema().isNullable()
   +            && valueSubField.get().schema().getNonNullType().getType() == 
HoodieSchemaType.BYTES
   +            && typedValueSubField.isPresent()
   +            && typedValueSubField.get().schema().isNullable()
   +            && fieldSchema.getFields().size() == 2;
          });
   ```
   </details>
   
   <!-- suggestion_start -->
   
   <details>
   <summary>📝 Committable suggestion</summary>
   
   > ‼️ **IMPORTANT**
   > Carefully review the code before committing. Ensure that it accurately 
replaces the highlighted code, contains no missing lines, and has no issues 
with indentation. Thoroughly test & benchmark the code to ensure it meets the 
requirements.
   
   ```suggestion
         List<HoodieSchemaField> fields = tvSchema.getFields();
         // Check if all fields follow the nested shredding pattern: each field 
is a record with {value, typed_value}
         boolean isNestedForm = !fields.isEmpty() && 
fields.stream().allMatch(field -> {
           HoodieSchema fieldSchema = field.schema();
           if (fieldSchema.isNullable()) {
             fieldSchema = fieldSchema.getNonNullType();
           }
           if (fieldSchema.getType() != HoodieSchemaType.RECORD) {
             return false;
           }
           Option<HoodieSchemaField> valueSubField = 
fieldSchema.getField(VARIANT_VALUE_FIELD);
           Option<HoodieSchemaField> typedValueSubField = 
fieldSchema.getField(VARIANT_TYPED_VALUE_FIELD);
           return valueSubField.isPresent()
               && valueSubField.get().schema().isNullable()
               && valueSubField.get().schema().getNonNullType().getType() == 
HoodieSchemaType.BYTES
               && typedValueSubField.isPresent()
               && typedValueSubField.get().schema().isNullable()
               && fieldSchema.getFields().size() == 2;
         });
   ```
   
   </details>
   
   <!-- suggestion_end -->
   
   <!-- 
fingerprinting:phantom:medusa:grasshopper:17576896-9233-4f28-812a-0d5a3e064d5f 
-->
   
   <!-- This is an auto-generated comment by CodeRabbit -->
   
   — *CodeRabbit* 
([original](https://github.com/yihua/hudi/pull/12#discussion_r3036104560)) 
(source:comment#3036104560)



##########
hudi-spark-datasource/hudi-spark4.0.x/src/test/java/org/apache/hudi/io/storage/row/TestHoodieRowParquetWriteSupportVariant.java:
##########
@@ -155,9 +190,71 @@ public void testWriteShreddedVariant() throws IOException {
     assertEquals(BINARY, valueField.asPrimitiveType().getPrimitiveTypeName());
     assertEquals(OPTIONAL, valueField.getRepetition(), "Shredded variant value 
must be OPTIONAL");
 
-    // Verify typed_value is omitted (as implementation skips it)
     boolean hasTypedValue = vGroup.getFields().stream().anyMatch(f -> 
f.getName().equals("typed_value"));
-    assertFalse(hasTypedValue, "typed_value field should be omitted in this 
writer implementation");
+    assertTrue(hasTypedValue, "typed_value field should be omitted in this 
writer implementation");

Review Comment:
   _⚠️ Potential issue_ | _🟡 Minor_
   
   **Fix the inverted assertion message.**
   
   This assertion expects `typed_value` to exist, but the failure message says 
it should be omitted. If it trips, the output will be misleading.
   
   <details>
   <summary>Suggested fix</summary>
   
   ```diff
   -    assertTrue(hasTypedValue, "typed_value field should be omitted in this 
writer implementation");
   +    assertTrue(hasTypedValue, "typed_value field should be present in 
shredded schemas");
   ```
   </details>
   
   <!-- suggestion_start -->
   
   <details>
   <summary>📝 Committable suggestion</summary>
   
   > ‼️ **IMPORTANT**
   > Carefully review the code before committing. Ensure that it accurately 
replaces the highlighted code, contains no missing lines, and has no issues 
with indentation. Thoroughly test & benchmark the code to ensure it meets the 
requirements.
   
   ```suggestion
       boolean hasTypedValue = vGroup.getFields().stream().anyMatch(f -> 
f.getName().equals("typed_value"));
       assertTrue(hasTypedValue, "typed_value field should be present in 
shredded schemas");
   ```
   
   </details>
   
   <!-- suggestion_end -->
   
   <details>
   <summary>🤖 Prompt for AI Agents</summary>
   
   ```
   Verify each finding against the current code and only fix it if needed.
   
   In
   
`@hudi-spark-datasource/hudi-spark4.0.x/src/test/java/org/apache/hudi/io/storage/row/TestHoodieRowParquetWriteSupportVariant.java`
   around lines 193 - 194, The assertion is inverted: currently it uses
   assertTrue(hasTypedValue, ...) but the message says "typed_value field 
should be
   omitted"; change the assertion to assertFalse(hasTypedValue, "typed_value 
field
   should be omitted in this writer implementation") so it correctly fails with 
the
   existing message (or alternatively keep assertTrue and update the message to
   reflect presence). Update the check around vGroup and the local boolean
   hasTypedValue accordingly.
   ```
   
   </details>
   
   <!-- 
fingerprinting:phantom:medusa:grasshopper:d884afb5-4216-4148-afc5-ef93118769b9 
-->
   
   <!-- This is an auto-generated comment by CodeRabbit -->
   
   — *CodeRabbit* 
([original](https://github.com/yihua/hudi/pull/12#discussion_r3036104566)) 
(source:comment#3036104566)



##########
hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/util/HoodieSchemaConverter.java:
##########
@@ -554,8 +554,7 @@ private static DataType convertUnion(HoodieSchema schema) {
 
   /**
    * Converts a Variant schema to Flink's ROW type.
-   * Variant is represented as ROW<`metadata` BYTES, `value` BYTES> in Flink.
-   * // TODO: We are only supporting unshredded for now, support shredded in 
the future
+   * Variant is represented as ROW<`value` BYTES, `metadata` BYTES> in Flink.

Review Comment:
   _⚠️ Potential issue_ | _🟡 Minor_
   
   **Fix Variant field-order Javadoc to match the actual schema order.**
   
   Line 557 documents `ROW<value BYTES, metadata BYTES>`, but `convertVariant` 
builds `metadata` then `value` (Line 570 then Line 571). This also conflicts 
with 
`hudi-common/src/main/java/org/apache/hudi/common/schema/HoodieSchema.java:680-710`
 and tests in 
`hudi-client/hudi-flink-client/src/test/java/org/apache/hudi/util/TestHoodieSchemaConverter.java:641-678`
 and 
`hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/table/ITTestVariantCrossEngineCompatibility.java:60-105`.
   
   <details>
   <summary>Suggested doc fix</summary>
   
   ```diff
   -   * Variant is represented as ROW<`value` BYTES, `metadata` BYTES> in 
Flink.
   +   * Variant is represented as ROW<metadata BYTES, `value` BYTES> in Flink.
   ```
   </details>
   
   <!-- suggestion_start -->
   
   <details>
   <summary>📝 Committable suggestion</summary>
   
   > ‼️ **IMPORTANT**
   > Carefully review the code before committing. Ensure that it accurately 
replaces the highlighted code, contains no missing lines, and has no issues 
with indentation. Thoroughly test & benchmark the code to ensure it meets the 
requirements.
   
   ```suggestion
      * Variant is represented as ROW<metadata BYTES, `value` BYTES> in Flink.
   ```
   
   </details>
   
   <!-- suggestion_end -->
   
   <details>
   <summary>🤖 Prompt for AI Agents</summary>
   
   ```
   Verify each finding against the current code and only fix it if needed.
   
   In
   
`@hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/util/HoodieSchemaConverter.java`
   at line 557, Update the Javadoc for the Variant field-order in
   HoodieSchemaConverter to reflect the actual order used by convertVariant: 
change
   the documented schema from ROW<value BYTES, metadata BYTES> to ROW<metadata
   BYTES, value BYTES> so it matches the implementation in convertVariant and 
the
   canonical order in HoodieSchema and related tests; verify the comment near 
the
   convertVariant method and any other Javadoc mentions in HoodieSchemaConverter
   mention metadata first then value.
   ```
   
   </details>
   
   <!-- 
fingerprinting:phantom:poseidon:hawk:554bb875-cb9a-4d20-8fbc-a68f96f5bfda -->
   
   <!-- This is an auto-generated comment by CodeRabbit -->
   
   — *CodeRabbit* 
([original](https://github.com/yihua/hudi/pull/12#discussion_r3036104552)) 
(source:comment#3036104552)



##########
hudi-common/src/main/java/org/apache/hudi/common/schema/HoodieSchema.java:
##########
@@ -866,6 +866,112 @@ public static HoodieSchema.Vector createVector(String 
name, int dimension, Vecto
     return new HoodieSchema.Vector(vectorSchema);
   }
 
+  /**
+   * Creates a shredded field struct per the Parquet variant shredding spec. 
The returned struct contains two nullable fields:
+   * <ul>
+   *   <li>{@code value}: nullable bytes (fallback binary representation)</li>
+   *   <li>{@code typed_value}: nullable type (the typed representation)</li>
+   * </ul>
+   *
+   * <p>Example output structure:
+   * <pre>
+   *   fieldName: struct
+   *     |-- value: binary (nullable)
+   *     |-- typed_value: &lt;fieldType&gt; (nullable)
+   * </pre></p>
+   *
+   * @param fieldName the name for the record (used as the Avro record name)
+   * @param fieldType the schema for the typed_value within this field
+   * @return a new HoodieSchema representing the shredded field struct
+   */
+  public static HoodieSchema createShreddedFieldStruct(String fieldName, 
HoodieSchema fieldType) {
+    ValidationUtils.checkArgument(fieldName != null && !fieldName.isEmpty(), 
"Field name cannot be null or empty");
+    ValidationUtils.checkArgument(fieldType != null, "Field type cannot be 
null");
+    List<HoodieSchemaField> fields = Arrays.asList(
+        HoodieSchemaField.of(
+            Variant.VARIANT_VALUE_FIELD,
+            HoodieSchema.createNullable(HoodieSchemaType.BYTES),
+            "Fallback binary representation",
+            NULL_VALUE
+        ),
+        HoodieSchemaField.of(
+            Variant.VARIANT_TYPED_VALUE_FIELD,
+            HoodieSchema.createNullable(fieldType),
+            "Typed value representation",
+            NULL_VALUE
+        )
+    );
+    return HoodieSchema.createRecord(fieldName, null, null, fields);
+  }

Review Comment:
   _⚠️ Potential issue_ | _🟠 Major_
   
   <details>
   <summary>🧩 Analysis chain</summary>
   
   🏁 Script executed:
   
   ```shell
   find . -type f -name "HoodieSchema.java" | head -5
   ```
   
   Repository: yihua/hudi
   
   Length of output: 129
   
   ---
   
   🏁 Script executed:
   
   ```shell
   cat -n 
hudi-common/src/main/java/org/apache/hudi/common/schema/HoodieSchema.java | sed 
-n '887,905p'
   ```
   
   Repository: yihua/hudi
   
   Length of output: 1036
   
   ---
   
   🏁 Script executed:
   
   ```shell
   cat -n 
hudi-common/src/main/java/org/apache/hudi/common/schema/HoodieSchema.java | sed 
-n '953,972p'
   ```
   
   Repository: yihua/hudi
   
   Length of output: 1223
   
   ---
   
   🏁 Script executed:
   
   ```shell
   rg "public static.*createRecord" 
hudi-common/src/main/java/org/apache/hudi/common/schema/HoodieSchema.java -A 3
   ```
   
   Repository: yihua/hudi
   
   Length of output: 562
   
   ---
   
   🏁 Script executed:
   
   ```shell
   rg "Variant\\.VARIANT_TYPED_VALUE_FIELD|Variant\\.VARIANT_VALUE_FIELD" 
hudi-common/src/main/java/org/apache/hudi/common/schema/HoodieSchema.java | 
head -20
   ```
   
   Repository: yihua/hudi
   
   Length of output: 442
   
   ---
   
   🏁 Script executed:
   
   ```shell
   rg "createShreddedFieldStruct|createVariantShreddedObject" 
hudi-common/src/main/java/org/apache/hudi/common/schema/HoodieSchema.java -B 2 
-A 2
   ```
   
   Repository: yihua/hudi
   
   Length of output: 1578
   
   ---
   
   🏁 Script executed:
   
   ```shell
   find . -type f -name "*Test*.java" | xargs grep -l 
"createShreddedFieldStruct\|createVariantShreddedObject" 2>/dev/null | head -5
   ```
   
   Repository: yihua/hudi
   
   Length of output: 263
   
   ---
   
   🏁 Script executed:
   
   ```shell
   rg "createShreddedFieldStruct|createVariantShreddedObject" 
./hudi-common/src/test/java/org/apache/hudi/common/schema/TestHoodieSchema.java 
-B 5 -A 10
   ```
   
   Repository: yihua/hudi
   
   Length of output: 4581
   
   ---
   
   🏁 Script executed:
   
   ```shell
   rg "createShreddedFieldStruct|createVariantShreddedObject" 
./hudi-spark-datasource/hudi-spark4.0.x/src/test/java/org/apache/hudi/io/storage/row/TestHoodieRowParquetWriteSupportVariant.java
 -B 5 -A 10
   ```
   
   Repository: yihua/hudi
   
   Length of output: 7562
   
   ---
   
   🏁 Script executed:
   
   ```shell
   # Check Avro schema naming rules - look for any comments or documentation 
about named type collisions
   rg "named.*type|duplicate.*record|global.*name|Avro.*schema" 
hudi-common/src/main/java/org/apache/hudi/common/schema/HoodieSchema.java | 
head -20
   ```
   
   Repository: yihua/hudi
   
   Length of output: 1519
   
   ---
   
   🏁 Script executed:
   
   ```shell
   # Get the constant values for field names
   rg "VARIANT_VALUE_FIELD|VARIANT_TYPED_VALUE_FIELD" 
hudi-common/src/main/java/org/apache/hudi/common/schema/HoodieSchema.java | 
head -10
   ```
   
   Repository: yihua/hudi
   
   Length of output: 634
   
   ---
   
   🏁 Script executed:
   
   ```shell
   # Look for tests with multiple variant fields at same level
   rg "\.put\(" 
./hudi-common/src/test/java/org/apache/hudi/common/schema/TestHoodieSchema.java 
| grep -i shredded -A 2 -B 2
   ```
   
   Repository: yihua/hudi
   
   Length of output: 696
   
   ---
   
   🏁 Script executed:
   
   ```shell
   # Check if there are tests with two variants in same record
   grep -n "createVariantShreddedObject" 
./hudi-common/src/test/java/org/apache/hudi/common/schema/TestHoodieSchema.java 
| wc -l
   ```
   
   Repository: yihua/hudi
   
   Length of output: 55
   
   ---
   
   🏁 Script executed:
   
   ```shell
   # Look for the actual Avro schema generation - check the toString() output 
to see if named types are duplicated
   sed -n '1,100p' 
./hudi-common/src/test/java/org/apache/hudi/common/schema/TestHoodieSchema.java 
| head -50
   ```
   
   Repository: yihua/hudi
   
   Length of output: 2069
   
   ---
   
   🏁 Script executed:
   
   ```shell
   # Check if there's any test creating a record with multiple variant fields
   rg "createRecord.*HoodieSchemaField.*variant" 
./hudi-common/src/test/java/org/apache/hudi/common/schema/TestHoodieSchema.java 
-B 3 -A 3
   ```
   
   Repository: yihua/hudi
   
   Length of output: 36
   
   ---
   
   🏁 Script executed:
   
   ```shell
   # Check for Avro native schema behavior - look at what getAvroSchema() 
returns for a shredded field struct
   cat -n 
hudi-common/src/main/java/org/apache/hudi/common/schema/HoodieSchema.java | sed 
-n '1,100p'
   ```
   
   Repository: yihua/hudi
   
   Length of output: 4547
   
   ---
   
   🏁 Script executed:
   
   ```shell
   # Look at what the namespace argument actually does in the signature
   cat -n 
hudi-common/src/main/java/org/apache/hudi/common/schema/HoodieSchema.java | sed 
-n '700,750p'
   ```
   
   Repository: yihua/hudi
   
   Length of output: 2843
   
   ---
   
   🏁 Script executed:
   
   ```shell
   # Check for tests with multiple variant fields in the same record structure
   grep -n "createVariantShreddedObject" 
./hudi-common/src/test/java/org/apache/hudi/common/schema/TestHoodieSchema.java 
| head -20
   ```
   
   Repository: yihua/hudi
   
   Length of output: 507
   
   ---
   
   🏁 Script executed:
   
   ```shell
   # Look more carefully at how field names are used as record names
   cat -n 
hudi-common/src/main/java/org/apache/hudi/common/schema/HoodieSchema.java | sed 
-n '887,975p'
   ```
   
   Repository: yihua/hudi
   
   Length of output: 4684
   
   ---
   
   🏁 Script executed:
   
   ```shell
   # Check if there's path-based naming already used elsewhere in the code
   rg "namespace|path" 
hudi-common/src/main/java/org/apache/hudi/common/schema/HoodieSchema.java | 
grep -i "name\|path\|fully" | head -20
   ```
   
   Repository: yihua/hudi
   
   Length of output: 1665
   
   ---
   
   🏁 Script executed:
   
   ```shell
   # Search for how schemas are serialized - what happens when you have 
duplicate named types
   rg "toString\|toJSON\|serialize" 
hudi-common/src/main/java/org/apache/hudi/common/schema/HoodieSchema.java | 
head -10
   ```
   
   Repository: yihua/hudi
   
   Length of output: 36
   
   ---
   
   🏁 Script executed:
   
   ```shell
   # Look at test that does round-trip to see if named type collisions would be 
caught
   grep -n "roundTrip\|parse\|toString" 
./hudi-common/src/test/java/org/apache/hudi/common/schema/TestHoodieSchema.java 
| head -20
   ```
   
   Repository: yihua/hudi
   
   Length of output: 1284
   
   ---
   
   🏁 Script executed:
   
   ```shell
   # Let me check if there's a test case that would catch the namespace bug at 
line 970-971
   sed -n '2950,2980p' 
./hudi-common/src/test/java/org/apache/hudi/common/schema/TestHoodieSchema.java
   ```
   
   Repository: yihua/hudi
   
   Length of output: 1461
   
   ---
   
   🏁 Script executed:
   
   ```shell
   # Let's see what testCreateVariantShreddedObjectWithCustomName actually tests
   sed -n '2945,2975p' 
./hudi-common/src/test/java/org/apache/hudi/common/schema/TestHoodieSchema.java
   ```
   
   Repository: yihua/hudi
   
   Length of output: 1437
   
   ---
   
   🏁 Script executed:
   
   ```shell
   # Check if namespace is properly propagated in the test
   rg "getNamespace\|namespace" 
./hudi-common/src/test/java/org/apache/hudi/common/schema/TestHoodieSchema.java 
| grep -A 2 -B 2 "Shredded"
   ```
   
   Repository: yihua/hudi
   
   Length of output: 36
   
   ---
   
   🏁 Script executed:
   
   ```shell
   # Generate a test case to see what actually gets created - check the avro 
schema output
   cat > /tmp/test_collision.py << 'EOF'
   # Simulate what happens with the namespace bug
   params = {
       "name": "typed_value",
       "namespace": None,
       "doc": "org.apache.hudi",  # BUG: namespace passed as doc
       "fields": []
   }
   
   print(f"Expected: name=typed_value, namespace='org.apache.hudi', doc=None")
   print(f"Actual: name={params['name']}, namespace={params['namespace']}, 
doc={params['doc']}")
   print()
   print("Impact: The typed_value record will have NO namespace, even though 
namespace was intended")
   EOF
   python /tmp/test_collision.py
   ```
   
   Repository: yihua/hudi
   
   Length of output: 272
   
   ---
   
   🏁 Script executed:
   
   ```shell
   # Check what the actual Avro Schema.createRecord signature expects
   rg "Schema.createRecord" 
hudi-common/src/main/java/org/apache/hudi/common/schema/HoodieSchema.java -A 1 
| head -20
   ```
   
   Repository: yihua/hudi
   
   Length of output: 1098
   
   ---
   
   🏁 Script executed:
   
   ```shell
   # Let me verify the exact issue - check what the inner typed_value record 
looks like in the generated schema
   sed -n '2970,3010p' 
./hudi-common/src/test/java/org/apache/hudi/common/schema/TestHoodieSchema.java
   ```
   
   Repository: yihua/hudi
   
   Length of output: 2035
   
   ---
   
   🏁 Script executed:
   
   ```shell
   # Look for any test that checks the namespace of the typed_value field
   rg "typed_value.*namespace\|getField.*typed_value" 
./hudi-common/src/test/java/org/apache/hudi/common/schema/TestHoodieSchema.java 
-B 3 -A 3
   ```
   
   Repository: yihua/hudi
   
   Length of output: 36
   
   ---
   
   🏁 Script executed:
   
   ```shell
   # Let me check if there's actually a test verifying the namespace on nested 
structures
   sed -n '2999,3030p' 
./hudi-common/src/test/java/org/apache/hudi/common/schema/TestHoodieSchema.java
   ```
   
   Repository: yihua/hudi
   
   Length of output: 1833
   
   ---
   
   🏁 Script executed:
   
   ```shell
   # Check if the namespace parameter affects the typed_value record by looking 
at the actual Avro output
   cat > /tmp/test_ns.py << 'EOF'
   # In line 970-971:
   # HoodieSchema.createRecord(Variant.VARIANT_TYPED_VALUE_FIELD, null, 
namespace, typedValueFields)
   # Maps to createRecord(String name, String namespace, String doc, ...)
   # Which becomes: createRecord(name, doc, namespace, false, ...)
   # So: name="typed_value", doc=null, namespace=namespace (the param)
   # This LOOKS CORRECT when you trace the conversion!
   
   # Wait - let me check the parameter order in the overload:
   # First overload: createRecord(String name, String namespace, String doc, 
List fields)
   # This calls: createRecord(name, doc, namespace, false, fields)
   # Note the REORDERING: (name, doc, namespace, false, fields)
   
   # So in line 970-971:
   # createRecord(VARIANT_TYPED_VALUE_FIELD, null, namespace, typedValueFields)
   # Maps to the first overload:
   # - name = VARIANT_TYPED_VALUE_FIELD
   # - namespace = null  <-- THIS IS THE SECOND PARAM
   # - doc = namespace   <-- THIS IS THE THIRD PARAM
   # Then it calls: createRecord(VARIANT_TYPED_VALUE_FIELD, namespace, null, 
false, typedValueFields)
   # Which becomes: createRecord(name, doc, namespace, false, fields)
   # Final: name=VARIANT_TYPED_VALUE_FIELD, doc=namespace, namespace=null
   
   # CONFIRMED: namespace goes to doc slot, null goes to namespace slot!
   
   print("BUG CONFIRMED: Line 970-971 passes arguments in wrong order")
   print("The 'namespace' parameter ends up in the 'doc' slot")
   print("The 'null' literal ends up in the 'namespace' slot")
   EOF
   python /tmp/test_ns.py
   ```
   
   Repository: yihua/hudi
   
   Length of output: 216
   
   ---
   
   🏁 Script executed:
   
   ```shell
   # Look for a test with multiple variant fields in same record
   grep -n "HoodieSchemaField.of.*variant\|createVariantShreddedObject" 
./hudi-spark-datasource/hudi-spark4.0.x/src/test/java/org/apache/hudi/io/storage/row/TestHoodieRowParquetWriteSupportVariant.java
 | head -30
   ```
   
   Repository: yihua/hudi
   
   Length of output: 1854
   
   ---
   
   🏁 Script executed:
   
   ```shell
   # Check if there are any tests creating multiple variant fields at sibling 
level
   sed -n '1,50p' 
./hudi-spark-datasource/hudi-spark4.0.x/src/test/java/org/apache/hudi/io/storage/row/TestHoodieRowParquetWriteSupportVariant.java
   ```
   
   Repository: yihua/hudi
   
   Length of output: 2362
   
   ---
   
   🏁 Script executed:
   
   ```shell
   # Check if any test validates the namespace of the inner typed_value record
   grep -n "getField.*typed_value\|getNamespace" 
./hudi-common/src/test/java/org/apache/hudi/common/schema/TestHoodieSchema.java 
| grep -A 5 -B 5 "typed_value"
   ```
   
   Repository: yihua/hudi
   
   Length of output: 36
   
   ---
   
   🏁 Script executed:
   
   ```shell
   # Let's verify if the bug actually breaks anything - check what happens to 
the typed_value record with namespace param
   cat > /tmp/verify_bug.py << 'EOF'
   # Simulate the createRecord call at line 970-971
   # First overload: createRecord(String name, String namespace, String doc, 
List fields)
   # Implementation calls: createRecord(name, doc, namespace, false, fields)
   
   # So when calling: createRecord("typed_value", null, "org.apache.hudi", 
typedValueFields)
   # Maps to: createRecord(name="typed_value", namespace=null, 
doc="org.apache.hudi", fields=typedValueFields)
   # Implementation does: createRecord("typed_value", "org.apache.hudi", null, 
false, typedValueFields)
   # Then calls Avro: Schema.createRecord(name, doc, namespace, isError)
   # Which is: Schema.createRecord("typed_value", "org.apache.hudi", null, 
false)
   # Result: name="typed_value", doc="org.apache.hudi", namespace=null
   
   # So the typed_value record LOSES its namespace and gets the namespace as 
doc instead
   
   # Avro fully qualified name: namespace.name
   # With bug: null."typed_value" = just "typed_value" 
   # Without bug: "org.apache.hudi"."typed_value" = 
"org.apache.hudi.typed_value"
   
   # This MATTERS because:
   # 1. Named types in Avro are resolved by fully qualified name
   # 2. Without namespace, "typed_value" is in the default namespace
   # 3. If you have multiple variants with typed_value records, they all map to 
the same FQN
   # 4. Avro will deduplicate/reuse the same named type definition
   # 5. This could cause issues with schema merging or validation
   
   print("BUG IMPACT: typed_value records lose their intended namespace")
   print("This could cause named type collision/reuse across multiple variants")
   EOF
   python /tmp/verify_bug.py
   ```
   
   Repository: yihua/hudi
   
   Length of output: 184
   
   ---
   
   🏁 Script executed:
   
   ```shell
   # Check if there's a test that creates multiple variant fields in a single 
record
   rg 
"HoodieSchemaField.of.*createVariantShreddedObject|HoodieSchemaField.of.*variant.*HoodieSchemaField.of.*variant"
 
./hudi-common/src/test/java/org/apache/hudi/common/schema/TestHoodieSchema.java 
-B 5
   ```
   
   Repository: yihua/hudi
   
   Length of output: 36
   
   ---
   
   🏁 Script executed:
   
   ```shell
   # Look for any collision detection in the schema or parser code
   rg "duplicate\|collision\|uniqueness\|already.*defined" 
hudi-common/src/main/java/org/apache/hudi/common/schema/HoodieSchema.java
   ```
   
   Repository: yihua/hudi
   
   Length of output: 36
   
   ---
   
   🏁 Script executed:
   
   ```shell
   # Check if any real code or tests create multiple variant fields in same 
record
   rg "variant.*variant" 
./hudi-common/src/test/java/org/apache/hudi/common/schema/TestHoodieSchema.java 
| head -10
   ```
   
   Repository: yihua/hudi
   
   Length of output: 824
   
   ---
   
   🏁 Script executed:
   
   ```shell
   # Let me look at the full scope/nesting context - are field structs at 
different scopes or same?
   cat > /tmp/analyze_scope.py << 'EOF'
   # Structure from line 953-972:
   # createVariantShreddedObject creates:
   # {
   #   "metadata": bytes,
   #   "value": binary,  
   #   "typed_value": {
   #     "a": { "value": binary, "typed_value": <INT> },
   #     "b": { "value": binary, "typed_value": <STRING> },
   #     "c": { "value": binary, "typed_value": <DECIMAL> }
   #   }
   # }
   
   # Named types created:
   # 1. Variant record (via createVariantShredded): name="variant", namespace 
from param
   # 2. typed_value record (line 970): name="typed_value", namespace=NULL 
(bug), should be from param
   # 3. a struct (via createShreddedFieldStruct): name="a", namespace=null
   # 4. b struct: name="b", namespace=null
   # 5. c struct: name="c", namespace=null
   
   # Avro named type fully qualified names:
   # 1. "variant" (or "org.apache.hudi.variant" if namespace provided)
   # 2. "typed_value" (SHOULD be "org.apache.hudi.typed_value" but namespace 
bug makes it just "typed_value")
   # 3. "a" (in default namespace)
   # 4. "b" (in default namespace)
   # 5. "c" (in default namespace)
   
   # COLLISION SCENARIO:
   # If you have TWO variant fields in same parent record:
   # Field v1 -> variant -> typed_value -> { a, b, c }
   # Field v2 -> variant -> typed_value -> { x, y, z }
   # 
   # Both create a named record called "typed_value" with no namespace
   # These could be seen as the same named type by Avro, causing collision
   
   # The field-level structs (a, b, c, x, y, z) are at different scopes within
   # their respective typed_value records, so they wouldn't collide
   
   print("ACTUAL COLLISION RISK:")
   print("1. Multiple variant fields in same record would create duplicate 
'typed_value' records")
   print("2. The inner field structs (a, b, c) are scoped so wouldn't collide")
   print("3. The PRIMARY bug is at line 970: namespace parameter ignored for 
typed_value")
   print("4. SECONDARY concern: field structs also get no namespace (line 904)")
   EOF
   python /tmp/analyze_scope.py
   ```
   
   Repository: yihua/hudi
   
   Length of output: 379
   
   ---
   
   🏁 Script executed:
   
   ```shell
   # Verify if Avro actually enforces global uniqueness for named types - check 
what Avro version is used
   rg "avro" hudi-common/pom.xml | head -5
   ```
   
   Repository: yihua/hudi
   
   Length of output: 350
   
   ---
   
   🏁 Script executed:
   
   ```shell
   # Let me check what happens when you serialize/deserialize multiple variants
   cat > /tmp/test_multi_variant.py << 'EOF'
   # If you have a record like:
   # {
   #   "v1": variant with shredded {a, b},
   #   "v2": variant with shredded {c, d}
   # }
   #
   # Avro schema would have:
   # v1 -> variant record -> typed_value record
   # v2 -> variant record -> typed_value record
   #
   # Both typed_value records have the same name but no namespace
   # In Avro's named type registry, they might collide
   
   # Let me check if the tests actually serialize/deserialize multiple variants
   print("Need to check if round-trip works with multiple variants in same 
record")
   EOF
   python /tmp/test_multi_variant.py
   ```
   
   Repository: yihua/hudi
   
   Length of output: 125
   
   ---
   
   🏁 Script executed:
   
   ```shell
   # Check if there are any tests with multiple variants in same schema
   grep -n "variant.*variant" 
./hudi-spark-datasource/hudi-spark4.0.x/src/test/java/org/apache/hudi/io/storage/row/TestHoodieRowParquetWriteSupportVariant.java
 | head -5
   ```
   
   Repository: yihua/hudi
   
   Length of output: 546
   
   ---
   
   🏁 Script executed:
   
   ```shell
   # Look at the actual namespace handling - does it preserve namespace in 
field structs?
   sed -n '960,972p' 
hudi-common/src/main/java/org/apache/hudi/common/schema/HoodieSchema.java
   ```
   
   Repository: yihua/hudi
   
   Length of output: 622
   
   ---
   
   🏁 Script executed:
   
   ```shell
   # Let me trace through one more time to be absolutely certain about the bug
   cat > /tmp/final_verification.py << 'EOF'
   # Line 970-971 call:
   # createRecord(Variant.VARIANT_TYPED_VALUE_FIELD, null, namespace, 
typedValueFields)
   
   # Method signature (line from earlier output):
   # public static HoodieSchema createRecord(String name, String namespace, 
String doc, List<HoodieSchemaField> fields)
   
   # Parameter mapping:
   call_args = ["Variant.VARIANT_TYPED_VALUE_FIELD", "null", "namespace", 
"typedValueFields"]
   method_params = ["name", "namespace", "doc", "fields"]
   
   print("Call: createRecord(VARIANT_TYPED_VALUE_FIELD, null, namespace, 
typedValueFields)")
   print("Signature: createRecord(String name, String namespace, String doc, 
List fields)")
   print()
   for i, (call, param) in enumerate(zip(call_args, method_params)):
       print(f"Argument {i+1}: {call:40s} → parameter: {param}")
   
   print("\nFirst overload then calls: createRecord(name, doc, namespace, 
false, fields)")
   print("Which becomes: createRecord(VARIANT_TYPED_VALUE_FIELD, namespace, 
null, false, typedValueFields)")
   print()
   
   # Now map to second overload
   second_call_args = ["VARIANT_TYPED_VALUE_FIELD", "namespace", "null", 
"false", "typedValueFields"]
   second_method_params = ["name", "doc", "namespace", "isError", "fields"]
   
   print("Second overload signature: createRecord(String name, String doc, 
String namespace, boolean isError, List fields)")
   print()
   for i, (call, param) in enumerate(zip(second_call_args, 
second_method_params)):
       correct = "✓" if (i==0 and call=="VARIANT_TYPED_VALUE_FIELD" and 
param=="name") or \
                         (i==1 and call=="namespace" and param=="doc") or \
                         (i==2 and call=="null" and param=="namespace") or \
                         (i==3 and call=="false" and param=="isError") or \
                         (i==4 and call=="typedValueFields" and 
param=="fields") else "✗"
       print(f"Argument {i+1}: {call:40s} → parameter: {param:12s} {correct}")
   
   print("\n⚠️  BUG CONFIRMED: 'namespace' ends up in 'doc' slot, 'null' ends 
up in 'namespace' slot")
   EOF
   python /tmp/final_verification.py
   ```
   
   Repository: yihua/hudi
   
   Length of output: 1290
   
   ---
   
   🏁 Script executed:
   
   ```shell
   # Check if there's a test that validates nested variant namespace 
preservation
   grep -n "testCreateVariantShreddedObjectWithCustomName" -A 20 
./hudi-common/src/test/java/org/apache/hudi/common/schema/TestHoodieSchema.java 
| head -25
   ```
   
   Repository: yihua/hudi
   
   Length of output: 1194
   
   ---
   
   🏁 Script executed:
   
   ```shell
   # Does the test check the typed_value field's namespace?
   sed -n '2954,2974p' 
./hudi-common/src/test/java/org/apache/hudi/common/schema/TestHoodieSchema.java
   ```
   
   Repository: yihua/hudi
   
   Length of output: 1012
   
   ---
   
   🏁 Script executed:
   
   ```shell
   # Check if the test validates the namespace of the nested typed_value record
   grep -n "getTypedValueField\|namespace" 
./hudi-common/src/test/java/org/apache/hudi/common/schema/TestHoodieSchema.java 
| grep -A 3 -B 3 "2954\|2970\|testCreateVariantShreddedObjectWithCustomName"
   ```
   
   Repository: yihua/hudi
   
   Length of output: 36
   
   ---
   
   🏁 Script executed:
   
   ```shell
   # Let me verify: does testCreateVariantShreddedObjectWithCustomName check 
typed_value namespace?
   sed -n '2950,2962p' 
./hudi-common/src/test/java/org/apache/hudi/common/schema/TestHoodieSchema.java
   ```
   
   Repository: yihua/hudi
   
   Length of output: 709
   
   ---
   
   
   
   </details>
   
   **Fix namespace argument order for the `typed_value` record at line 
970–971.**
   
   The `namespace` parameter is passed as the third argument (mapped to `doc`) 
instead of the second (mapped to `namespace`):
   ```java
   HoodieSchema typedValueSchema = HoodieSchema.createRecord(
       Variant.VARIANT_TYPED_VALUE_FIELD, null, namespace, typedValueFields);
   ```
   
   Should be:
   ```java
   HoodieSchema typedValueSchema = HoodieSchema.createRecord(
       Variant.VARIANT_TYPED_VALUE_FIELD, namespace, null, typedValueFields);
   ```
   
   When `createVariantShreddedObject` is called with a `namespace` parameter, 
the `typed_value` record loses it. Multiple variant fields in the same parent 
record will both create unqualified `typed_value` records, causing schema 
collision during serialization or round-trip parsing.
   
   <details>
   <summary>🤖 Prompt for AI Agents</summary>
   
   ```
   Verify each finding against the current code and only fix it if needed.
   
   In 
`@hudi-common/src/main/java/org/apache/hudi/common/schema/HoodieSchema.java`
   around lines 887 - 905, The typed_value record is being constructed with the
   namespace and doc arguments swapped, causing the namespace to be lost; locate
   the creation of typedValueSchema (in createVariantShreddedObject / the block
   that calls HoodieSchema.createRecord with Variant.VARIANT_TYPED_VALUE_FIELD) 
and
   swap the second and third arguments so the namespace is passed as the second
   parameter and doc as the third (i.e., call
   HoodieSchema.createRecord(Variant.VARIANT_TYPED_VALUE_FIELD, namespace, null,
   typedValueFields) instead of
   HoodieSchema.createRecord(Variant.VARIANT_TYPED_VALUE_FIELD, null, namespace,
   typedValueFields)) to preserve qualification and avoid schema collisions.
   ```
   
   </details>
   
   <!-- 
fingerprinting:phantom:medusa:grasshopper:17576896-9233-4f28-812a-0d5a3e064d5f 
-->
   
   <!-- This is an auto-generated comment by CodeRabbit -->
   
   — *CodeRabbit* 
([original](https://github.com/yihua/hudi/pull/12#discussion_r3036104556)) 
(source:comment#3036104556)



##########
hudi-spark-datasource/hudi-spark4.0.x/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala:
##########
@@ -253,10 +253,11 @@ private[sql] class AvroDeserializer(rootAvroType: Schema,
         val decimal = createDecimal(bigDecimal, d.getPrecision, d.getScale)
         updater.setDecimal(ordinal, decimal)
 
-      case (RECORD, VariantType) if avroType.getProp("logicalType") == 
HoodieSchema.VARIANT_TYPE_NAME =>
+      case (RECORD, VariantType) if avroType.getLogicalType != null
+        && avroType.getLogicalType.getName == HoodieSchema.VARIANT_TYPE_NAME =>

Review Comment:
   _⚠️ Potential issue_ | _🟠 Major_
   
   **Verify Avro logical type handling and consider adding a fallback property 
check.**
   
   The change to use `Schema.getLogicalType()` depends on the underlying Avro 
version's behavior. According to Apache Avro documentation, `getLogicalType()` 
returns a `LogicalType` instance only if:
   1. A `LogicalTypeFactory` for that type name was registered before the 
schema was parsed, OR
   2. The specific Avro version automatically constructs it from the schema's 
`logicalType` property
   
   Using a compile-time constant reference (`HoodieSchema.VARIANT_TYPE_NAME`) 
does not guarantee the `HoodieSchema` static initialization block runs before 
deserialization, which means custom logical type factories may not be 
registered when this code executes on schemas parsed externally (e.g., via 
plain `Schema.Parser`).
   
   Suggest confirming the Avro version's behavior with a regression test that 
parses Variant schemas via standard Avro parsing before Hudi's custom 
registration. If `getLogicalType()` can return null in that scenario, add the 
`getProp("logicalType")` fallback:
   
   ```diff
         case (RECORD, VariantType) if
   +       (avroType.getLogicalType != null
   +         && avroType.getLogicalType.getName == 
HoodieSchema.VARIANT_TYPE_NAME) ||
   -       avroType.getLogicalType != null
   -         && avroType.getLogicalType.getName == 
HoodieSchema.VARIANT_TYPE_NAME =>
   +       HoodieSchema.VARIANT_TYPE_NAME == avroType.getProp("logicalType") =>
   ```
   
   <!-- suggestion_start -->
   
   <details>
   <summary>📝 Committable suggestion</summary>
   
   > ‼️ **IMPORTANT**
   > Carefully review the code before committing. Ensure that it accurately 
replaces the highlighted code, contains no missing lines, and has no issues 
with indentation. Thoroughly test & benchmark the code to ensure it meets the 
requirements.
   
   ```suggestion
         case (RECORD, VariantType) if
           (avroType.getLogicalType != null
             && avroType.getLogicalType.getName == 
HoodieSchema.VARIANT_TYPE_NAME) ||
           HoodieSchema.VARIANT_TYPE_NAME == avroType.getProp("logicalType") =>
   ```
   
   </details>
   
   <!-- suggestion_end -->
   
   <details>
   <summary>🤖 Prompt for AI Agents</summary>
   
   ```
   Verify each finding against the current code and only fix it if needed.
   
   In
   
`@hudi-spark-datasource/hudi-spark4.0.x/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala`
   around lines 256 - 257, The pattern in AvroDeserializer.scala that checks
   avroType.getLogicalType may miss schemas parsed before Hudi's logical-type
   factory registration; update the case (RECORD, VariantType) branch to also 
check
   avroType.getProp("logicalType") as a fallback (e.g., compare that string to
   HoodieSchema.VARIANT_TYPE_NAME) so Variant schemas parsed via plain
   Schema.Parser are handled; ensure the change is applied in the matching code
   path that currently uses avroType.getLogicalType and add/update a regression
   test that parses a Variant logicalType via Schema.Parser before Hudi
   registration to validate the fallback behavior.
   ```
   
   </details>
   
   <!-- 
fingerprinting:phantom:medusa:grasshopper:17576896-9233-4f28-812a-0d5a3e064d5f 
-->
   
   <!-- This is an auto-generated comment by CodeRabbit -->
   
   — *CodeRabbit* 
([original](https://github.com/yihua/hudi/pull/12#discussion_r3036104564)) 
(source:comment#3036104564)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] feat(schema): Add support to write shredded variants for HoodieRecordType.SPARK [hudi]

Reply via email to