[
https://issues.apache.org/jira/browse/HUDI-8963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Davis Zhang updated HUDI-8963:
------------------------------
Fix Version/s: 1.0.2
> Multi-writer schema evolution interleaved with compaction can have issues
> -------------------------------------------------------------------------
>
> Key: HUDI-8963
> URL: https://issues.apache.org/jira/browse/HUDI-8963
> Project: Apache Hudi
> Issue Type: Bug
> Reporter: Davis Zhang
> Priority: Major
> Fix For: 1.0.2
>
>
> first delta commit use schema 1 and committed
> second delta commit use schema 2 (valid schema evolution) and goes inflight
> compaction.request
> second delta commit finishes
> compaction execution hit issues
> {code:java}
> drwxr-xr-x@ 2 zhanyeha staff 64 Feb 5 17:36 history
> -rw-r--r--@ 1 zhanyeha staff 0 Feb 5 17:36 0011.deltacommit.requested
> -rw-r--r--@ 1 zhanyeha staff 0 Feb 5 17:36 0011.deltacommit.inflight
> -rw-r--r--@ 1 zhanyeha staff 4314 Feb 5 17:36
> 0011_20250205173626508.deltacommit
> -rw-r--r--@ 1 zhanyeha staff 0 Feb 5 17:36 0012.deltacommit.requested
> -rw-r--r--@ 1 zhanyeha staff 2795 Feb 5 17:36 0012.deltacommit.inflight
> -rw-r--r--@ 1 zhanyeha staff 4502 Feb 5 17:36
> 0012_20250205173628037.deltacommit
> -rw-r--r--@ 1 zhanyeha staff 0 Feb 5 17:36 0021.deltacommit.requested
> -rw-r--r--@ 1 zhanyeha staff 113 Feb 5 17:36 0021.deltacommit.inflight
> -rw-r--r--@ 1 zhanyeha staff 0 Feb 5 17:36 0031.deltacommit.requested
> -rw-r--r--@ 1 zhanyeha staff 113 Feb 5 17:36 0031.deltacommit.inflight
> -rw-r--r--@ 1 zhanyeha staff 3186 Feb 5 17:36 0032.compaction.requested
> -rw-r--r--@ 1 zhanyeha staff 3829 Feb 5 17:36
> 0021_20250205173628336.deltacommit
> {code}
> Error is projection step hits NPE. It is high chance that the compaction
> writer schema and the data it handles mismatch, resulting in accessing
> non-existing data fields.
>
> {code:java}
> at org.apache.spark.api.java.JavaRDDLike.collect$(JavaRDDLike.scala:361)
> at
> org.apache.spark.api.java.AbstractJavaRDDLike.collect(JavaRDDLike.scala:45)
> at
> org.apache.hudi.data.HoodieJavaRDD.collectAsList(HoodieJavaRDD.java:200)
> at
> org.apache.hudi.table.action.compact.RunCompactionActionExecutor.execute(RunCompactionActionExecutor.java:113)
> ... 136 more
> Caused by: java.lang.NullPointerException
> at
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeWriter.write(UnsafeWriter.java:110)
> at
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.writeFields_0_4$(Unknown
> Source)
> at
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
> Source)
> at
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
> Source)
> at
> org.apache.spark.sql.execution.datasources.RecordReaderIterator$$anon$1.next(RecordReaderIterator.scala:62)
> at
> org.apache.hudi.util.CloseableInternalRowIterator.next(CloseableInternalRowIterator.scala:57)
> at
> org.apache.hudi.util.CloseableInternalRowIterator.next(CloseableInternalRowIterator.scala:36)
> at
> org.apache.hudi.common.table.read.HoodieKeyBasedFileGroupRecordBuffer.doHasNext(HoodieKeyBasedFileGroupRecordBuffer.java:140)
> at
> org.apache.hudi.common.table.read.HoodieBaseFileGroupRecordBuffer.hasNext(HoodieBaseFileGroupRecordBuffer.java:160)
> at
> org.apache.hudi.common.table.read.HoodieFileGroupReader.hasNext(HoodieFileGroupReader.java:260)
> at
> org.apache.hudi.common.table.read.HoodieFileGroupReader$HoodieFileGroupReaderIterator.hasNext(HoodieFileGroupReader.java:331)
> at
> org.apache.hudi.io.HoodieSparkFileGroupReaderBasedMergeHandle.write(HoodieSparkFileGroupReaderBasedMergeHandle.java:203)
> at
> org.apache.hudi.table.HoodieSparkCopyOnWriteTable.compactUsingFileGroupReader(HoodieSparkCopyOnWriteTable.java:281)
> at
> org.apache.hudi.table.action.compact.HoodieCompactor.compact(HoodieCompactor.java:305)
> at
> org.apache.hudi.table.action.compact.HoodieCompactor.lambda$compact$8ace6636$1(HoodieCompactor.java:159)
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)