the-other-tim-brown commented on code in PR #17833:
URL: https://github.com/apache/hudi/pull/17833#discussion_r2699311648


##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/utils/SparkInternalSchemaConverter.java:
##########
@@ -490,6 +490,10 @@ public static boolean 
convertColumnVectorType(WritableColumnVector oldV, Writabl
         return convertDateType(oldV, newV, newType, len);
       } else if (oldType instanceof TimestampType) {
         return false;
+      } else if (oldType instanceof StructType) {

Review Comment:
   Will this accidentally catch all structs?



##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/io/storage/HoodieSparkParquetReader.java:
##########
@@ -45,7 +45,7 @@
 import org.apache.parquet.schema.MessageType;
 import org.apache.parquet.schema.SchemaRepair;
 import org.apache.spark.sql.HoodieInternalRowUtils;
-import org.apache.spark.sql.avro.HoodieSparkSchemaConverters;
+import org.apache.spark.sql.avro.HoodieSparkSchemaConverters$;

Review Comment:
   Nit: does this need to change?



##########
hudi-common/src/test/java/org/apache/hudi/common/testutils/ZipTestUtils.java:
##########
@@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.testutils;
+
+import java.io.BufferedInputStream;
+import java.io.FileOutputStream;
+import java.io.InputStream;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.util.stream.Stream;
+import java.util.zip.ZipEntry;
+import java.util.zip.ZipInputStream;
+
+/**
+ * Utility methods for test zip file operations.
+ */
+public class ZipTestUtils {
+
+  /**
+   * Extracts a zip file from resources to a temporary directory and returns 
the path to the table directory.
+   * Zip files are created using: cd $wd && zip -r variant_cow.zip variant_cow
+   *
+   * @param resourceZipPath Path to the zip file in resources (e.g., 
"/variant_backward_compat/variant_cow.zip")
+   * @param targetDir Target directory where the zip contents will be extracted
+   * @return Path to the extracted table directory (the first subdirectory in 
the extracted contents)
+   */
+  public static String extractZipFromResources(String resourceZipPath, Path 
targetDir) throws Exception {

Review Comment:
   We have a `HoodieTestUtils#extractZipToDirectory` - can we reuse that?



##########
hudi-common/src/test/resources/variant_backward_compat/variant_cow.zip:
##########


Review Comment:
   Let's add a README to the directory to explain how the zip files are 
generated



##########
hudi-common/src/main/java/org/apache/hudi/internal/schema/convert/InternalSchemaConverter.java:
##########
@@ -346,6 +346,18 @@ private static Type 
visitPrimitiveToBuildInternalType(HoodieSchema schema) {
         return Types.DateType.get();
       case NULL:
         return null;
+      case VARIANT:
+        // Variant is represented as a record with value and metadata binary 
fields
+        // Convert it to the internal schema representation as a RecordType
+        // Since Variant is treated as a primitive here but needs to be a 
record,
+        // we return a RecordType with the appropriate structure
+        List<Types.Field> variantFields = new ArrayList<>();
+        // Assign field IDs: these are used for schema evolution tracking
+        // Use negative IDs: indicate these are system-generated for Variant 
type
+        // TODO (voon): Check if we can remove the magic numbers?

Review Comment:
   Who can we reach out to in order to figure out how we should handle this?



##########
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java:
##########
@@ -2063,8 +2063,12 @@ private static boolean 
isColumnTypeSupportedV1(HoodieSchema schema, Option<Hoodi
   private static boolean isColumnTypeSupportedV2(HoodieSchema schema) {
     HoodieSchemaType type = schema.getType();
     // Check for precision and scale if the schema has a logical decimal type.
+    // VARIANT (unshredded) type is excluded because it stores semi-structured 
data as opaque binary blobs,
+    // making min/max statistics meaningless
+    // TODO: For shredded, we are able to store colstats, explore that

Review Comment:
   Let's make a GH Issue and then link it here so we don't forget



##########
hudi-common/src/main/java/org/apache/hudi/internal/schema/convert/InternalSchemaConverter.java:
##########
@@ -346,6 +346,18 @@ private static Type 
visitPrimitiveToBuildInternalType(HoodieSchema schema) {
         return Types.DateType.get();
       case NULL:
         return null;
+      case VARIANT:
+        // Variant is represented as a record with value and metadata binary 
fields
+        // Convert it to the internal schema representation as a RecordType
+        // Since Variant is treated as a primitive here but needs to be a 
record,
+        // we return a RecordType with the appropriate structure
+        List<Types.Field> variantFields = new ArrayList<>();

Review Comment:
   supernit: make the list size 2



##########
hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/util/HoodieSchemaConverter.java:
##########
@@ -445,6 +447,26 @@ private static DataType convertUnion(HoodieSchema schema) {
     return nullable ? rawDataType.nullable() : rawDataType;
   }
 
+  /**
+   * Converts a Variant schema to Flink's ROW type.
+   * Variant is represented as ROW<`value` BYTES, `metadata` BYTES> in Flink.
+   * // TODO: We are only supporting shredded for now

Review Comment:
   This should say `unshredded`?



##########
hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/HoodieSchemaConversionUtils.scala:
##########
@@ -18,17 +18,17 @@
 
 package org.apache.hudi
 
-import org.apache.avro.generic.GenericRecord
 import org.apache.hudi.HoodieSparkUtils.sparkAdapter
 import org.apache.hudi.common.schema.{HoodieSchema, HoodieSchemaType, 
HoodieSchemaUtils}
 import org.apache.hudi.internal.schema.HoodieSchemaException
 
-import org.apache.avro.{AvroRuntimeException, Schema}
+import org.apache.avro.AvroRuntimeException

Review Comment:
   nit: just for the sake of keeping the PR lean, let's remove any imports 
fixes from this PR



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to