readers using Flink's Variant type [hudi]

via GitHub Wed, 13 May 2026 08:09:29 -0700


hudi-agent commented on code in PR #18539:
URL: https://github.com/apache/hudi/pull/18539#discussion_r3235347883



##########
hudi-flink-datasource/hudi-flink2.1.x/src/main/java/org/apache/hudi/table/format/cow/ParquetSplitReaderUtil.java:
##########
@@ -498,6 +509,23 @@ private static ColumnReader createColumnReader(
           }
         }
         return new RowColumnReader(fieldReaders);
+      case VARIANT:
+        // Top-level variant columns only. Nested variant (e.g. 
ARRAY<VARIANT>, ROW<..., VARIANT>)
+        // is not yet supported — see ColumnarGroupRowData.getVariant() / 
ColumnarGroupArrayData.getVariant().
+        //
+        // Variant is stored in Parquet as a group with binary "metadata" and 
"value" fields.
+        // VectorizedColumnBatch.getVariant() expects child order [value(0), 
metadata(1)]
+        // to match the BinaryVariant(value, metadata) constructor.
+        GroupType variantGroup = physicalType.asGroupType();

Review Comment:
   🤖 nit: `variantDescs` and `metaDescs` are each single-element lists whose 
only purpose is the immediate `.get(0)` call — the MAP case in the same switch 
inlines this as `descriptors.get(0)` directly. Could you collapse these to 
`ColumnDescriptor valueDesc = filterDescriptors(depth + 1, 
variantGroup.getType(valueIdx), columns).get(0);` (and same for `metaDesc`) to 
make it obvious only one descriptor is expected per binary child?
   
   <sub><i>- AI-generated; verify before applying. React 👍/👎 to flag 
quality.</i></sub>



##########
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/table/ITTestVariantCrossEngineCompatibility.java:
##########
@@ -72,8 +73,37 @@ private void verifyFlinkCanReadSparkVariantTable(String 
tablePath, String tableT
             + "  'table.type' = '%s'"
             + ")",
         tablePath, tableType);
+  }
 
-    tableEnv.executeSql(createTableDdl);
+  /**
+   * On Flink 2.1+ verifies that Flink can fully read Spark 4.0 Variant tables.
+   * On pre-2.1 Flink verifies that reading fails with {@link 
UnsupportedOperationException}
+   * because native VariantType is not available.
+   */
+  private void verifyFlinkCanReadSparkVariantTable(String tablePath, String 
tableType, String testDescription) throws Exception {
+    TableEnvironment tableEnv = TestTableEnvs.getBatchTableEnv();
+    tableEnv.executeSql(createVariantTableDdl(tablePath, tableType));
+
+    boolean variantSupported;
+    try {
+      DataTypeAdapter.createVariantType();
+      variantSupported = true;
+    } catch (UnsupportedOperationException e) {
+      variantSupported = false;
+    }
+
+    if (!variantSupported) {
+      assertThrows(
+          Exception.class,

Review Comment:
   🤖 nit: this try/catch probe is duplicated verbatim from 
`TestHoodieSchemaConverter.hasNativeVariantType()` — could you extract a 
private static `hasNativeVariantType()` helper here as well so the two test 
classes stay in sync if the detection logic ever changes?
   
   <sub><i>- AI-generated; verify before applying. React 👍/👎 to flag 
quality.</i></sub>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] feat(flink): write/read (only unshredded) variant to Flink parquet file writers/readers using Flink's Variant type [hudi]

Reply via email to