yihua commented on code in PR #12772:
URL: https://github.com/apache/hudi/pull/12772#discussion_r2074278055
##########
hudi-common/src/test/java/org/apache/hudi/common/table/timeline/versioning/BaseTestCommitMetadataSerDe.java:
##########
@@ -243,10 +243,9 @@ protected void testCorruptedJsonFile() {
Exception ex = assertThrows(IOException.class, () ->
serDe.deserialize(instant, new ByteArrayInputStream(serialized.get()), () ->
false, HoodieCommitMetadata.class));
if (serDe instanceof CommitMetadataSerDeV2) {
- assertEquals(ex.getCause().getMessage(), "Not an Avro data file.");
+ assertEquals("Not an Avro data file.", ex.getCause().getMessage());
} else {
- assertEquals(ex.getCause().getMessage(), "No content to map due to
end-of-input\n"
- + " at [Source: (ByteArrayInputStream); line: 1, column: 0]");
+ assertTrue(ex.getCause().getMessage().startsWith("No content to map due
to end-of-input"));
Review Comment:
Does the error message change? Should the exact error message be asserted
based on the Spark version?
##########
hudi-spark-datasource/hudi-spark-common/src/main/java/org/apache/hudi/commit/DatasetBulkInsertCommitActionExecutor.java:
##########
@@ -66,6 +66,9 @@ protected Option<HoodieData<WriteStatus>>
doExecute(Dataset<Row> records, boolea
if (HoodieSparkUtils.isSpark3()) {
targetFormat = "org.apache.hudi.spark3.internal";
customOpts.put(HoodieInternalConfig.BULKINSERT_INPUT_DATA_SCHEMA_DDL.key(),
records.schema().json());
+ } else if (HoodieSparkUtils.isSpark4()) {
+ targetFormat = "org.apache.hudi.spark4.internal";
+
customOpts.put(HoodieInternalConfig.BULKINSERT_INPUT_DATA_SCHEMA_DDL.key(),
records.schema().json());
Review Comment:
Is there any Datasource v2 API change between Spark 3 and 4? If not, we
should reuse the classes for bulk insert using Datasource v2 implementation.
##########
hudi-io/pom.xml:
##########
@@ -143,5 +143,11 @@
<version>${slf4j.version}</version>
<scope>provided</scope>
</dependency>
+
+ <dependency>
+ <groupId>com.google.code.findbugs</groupId>
Review Comment:
What is this used for in `hudi-io` module?
##########
hudi-common/src/main/java/org/apache/hudi/common/table/read/FileGroupRecordBuffer.java:
##########
@@ -351,7 +351,7 @@ protected Option<DeleteRecord>
doProcessNextDeletedRecord(DeleteRecord deleteRec
// because we use 0 as the default value which means natural order
boolean chooseExisting = !deleteOrderingVal.equals(0)
&& ReflectionUtils.isSameClass(existingOrderingVal,
deleteOrderingVal)
- && existingOrderingVal.compareTo(deleteOrderingVal) > 0;
+ && readerContext.compareValues(existingOrderingVal,
deleteOrderingVal) > 0;
Review Comment:
Similar here on the order value comparison.
##########
hudi-spark-datasource/hudi-spark-common/src/main/java/org/apache/hudi/commit/DatasetBulkInsertCommitActionExecutor.java:
##########
@@ -66,6 +66,9 @@ protected Option<HoodieData<WriteStatus>>
doExecute(Dataset<Row> records, boolea
if (HoodieSparkUtils.isSpark3()) {
targetFormat = "org.apache.hudi.spark3.internal";
customOpts.put(HoodieInternalConfig.BULKINSERT_INPUT_DATA_SCHEMA_DDL.key(),
records.schema().json());
+ } else if (HoodieSparkUtils.isSpark4()) {
+ targetFormat = "org.apache.hudi.spark4.internal";
+
customOpts.put(HoodieInternalConfig.BULKINSERT_INPUT_DATA_SCHEMA_DDL.key(),
records.schema().json());
Review Comment:
And possibly rename the package: `org.apache.hudi.spark3.internal` ->
`org.apache.hudi.spark.internal` to be shared. The reason that `spark3` is
used in the package name is because those Hudi classes are added when there is
both Spark 2 and 3 support. Now all Spark versions Hudi supports have row
writer/DS v2 support.
##########
hudi-common/src/main/java/org/apache/hudi/common/engine/HoodieReaderContext.java:
##########
@@ -361,6 +361,17 @@ public Comparable convertValueToEngineType(Comparable
value) {
return value;
}
+ /**
+ * Compares two values.
+ * As [SPARK-46832] UTF8String doesn't support compareTo anymore, we need to
have engine-specific implementations.
+ * @param a first value that is compared to
+ * @param b second value
+ * @return a negative integer, zero, or a positive integer as first object
is less than, equal to, or greater than the second.
+ */
+ public int compareValues(Comparable a, Comparable b) {
+ return a.compareTo(b);
+ }
Review Comment:
Similarly, could we wrap `UTF8String` with `HoodieUTF8String` for ordering
value comparison to get rid of this API?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]