[GitHub] [hudi] xushiyan commented on a diff in pull request #5378: [HUDI-3934] Fix `Spark32HoodieParquetFileFormat` not being compatible w/ Spark 3.2.0

GitBox Wed, 20 Apr 2022 19:39:55 -0700


xushiyan commented on code in PR #5378:
URL: https://github.com/apache/hudi/pull/5378#discussion_r854721036



##########
hudi-spark-datasource/hudi-spark3.1.x/src/main/scala/org/apache/spark/sql/adapter/Spark3_1Adapter.scala:
##########
@@ -23,7 +23,7 @@ import org.apache.spark.SPARK_VERSION
 import org.apache.spark.sql.avro.{HoodieAvroDeserializer, 
HoodieAvroSerializer, HoodieSpark3_1AvroDeserializer, 
HoodieSpark3_1AvroSerializer}
 import org.apache.spark.sql.catalyst.plans.logical._
 import org.apache.spark.sql.catalyst.rules.Rule
-import org.apache.spark.sql.execution.datasources.parquet.{ParquetFileFormat, 
Spark312HoodieParquetFileFormat}
+import org.apache.spark.sql.execution.datasources.parquet.{ParquetFileFormat, 
Spark31HoodieParquetFileFormat}

Review Comment:
   this renaming seems to be fine. @xiaorixiaoyao is there any particular 
reason that it was only meant for Spark312?



##########
hudi-common/src/main/java/org/apache/hudi/common/util/ReflectionUtils.java:
##########
@@ -85,11 +87,14 @@ public static <T extends HoodieRecordPayload> T 
loadPayload(String recordPayload
    * Creates an instance of the given class. Use this version when dealing 
with interface types as constructor args.
    */
   public static Object loadClass(String clazz, Class<?>[] constructorArgTypes, 
Object... constructorArgs) {
-    try {
-      return 
getClass(clazz).getConstructor(constructorArgTypes).newInstance(constructorArgs);
-    } catch (InstantiationException | IllegalAccessException | 
InvocationTargetException | NoSuchMethodException e) {
-      throw new HoodieException("Unable to instantiate class " + clazz, e);
-    }
+    return newInstanceUnchecked(getClass(clazz), constructorArgTypes, 
constructorArgs);
+  }
+
+  /**
+   * Creates an instance of the given class. Constructor arg types are 
inferred.
+   */
+  public static Object loadClass(String clazz, Object... constructorArgs) {
+    return newInstanceUnchecked(getClass(clazz), constructorArgs);

Review Comment:
   this is affecting a lot of code paths. can't we make a specialized one just 
for this patch and leave the original as is? we should mitigate the risk here.



##########
hudi-common/src/main/java/org/apache/hudi/common/util/ReflectionUtils.java:
##########
@@ -85,11 +87,14 @@ public static <T extends HoodieRecordPayload> T 
loadPayload(String recordPayload
    * Creates an instance of the given class. Use this version when dealing 
with interface types as constructor args.
    */
   public static Object loadClass(String clazz, Class<?>[] constructorArgTypes, 
Object... constructorArgs) {
-    try {
-      return 
getClass(clazz).getConstructor(constructorArgTypes).newInstance(constructorArgs);
-    } catch (InstantiationException | IllegalAccessException | 
InvocationTargetException | NoSuchMethodException e) {
-      throw new HoodieException("Unable to instantiate class " + clazz, e);
-    }
+    return newInstanceUnchecked(getClass(clazz), constructorArgTypes, 
constructorArgs);
+  }
+
+  /**
+   * Creates an instance of the given class. Constructor arg types are 
inferred.
+   */
+  public static Object loadClass(String clazz, Object... constructorArgs) {
+    return newInstanceUnchecked(getClass(clazz), constructorArgs);

Review Comment:
   I don't even see this `newInstanceUnchecked` was used in this patch. can we 
revert the change? or create a special one if you need it here?



##########
hudi-spark-datasource/hudi-spark3/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/Spark32HoodieParquetFileFormat.scala:
##########
@@ -225,6 +239,10 @@ class Spark32HoodieParquetFileFormat(private val 
shouldAppendPartitionValues: Bo
       if (enableVectorizedReader) {
         val vectorizedReader =
           if (shouldUseInternalSchema) {
+            val int96RebaseSpec =
+              
DataSourceUtils.int96RebaseSpec(footerFileMetaData.getKeyValueMetaData.get, 
int96RebaseModeInRead)
+            val datetimeRebaseSpec =
+              
DataSourceUtils.datetimeRebaseSpec(footerFileMetaData.getKeyValueMetaData.get, 
datetimeRebaseModeInRead)

Review Comment:
   this implies schema evolution only works with 3.2.1 ? cc @xiaorixiaoyao



##########
hudi-spark-datasource/hudi-spark3.1.x/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/Spark31HoodieParquetFileFormat.scala:
##########
@@ -61,7 +61,7 @@ import java.net.URI
  *   <li>Schema on-read</li>
  * </ol>
  */
-class Spark312HoodieParquetFileFormat(private val shouldAppendPartitionValues: 
Boolean) extends ParquetFileFormat {
+class Spark31HoodieParquetFileFormat(private val shouldAppendPartitionValues: 
Boolean) extends ParquetFileFormat {

Review Comment:
   the changes to 31 parquet format is purely refactoring right? looking ok but 
i would rather be super paranoid at this stage to introduce more diffs than 
needed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] xushiyan commented on a diff in pull request #5378: [HUDI-3934] Fix `Spark32HoodieParquetFileFormat` not being compatible w/ Spark 3.2.0

Reply via email to