Davis Zhang created HUDI-8963:
---------------------------------
Summary: Multi-writer schema evolution interleaved with compaction
can have issues
Key: HUDI-8963
URL: https://issues.apache.org/jira/browse/HUDI-8963
Project: Apache Hudi
Issue Type: Bug
Reporter: Davis Zhang
first delta commit use schema 1 and committed
second delta commit use schema 2 (valid schema evolution) and goes inflight
compaction.request
second delta commit finishes
compaction execution hit issues
{code:java}
drwxr-xr-x@ 2 zhanyeha staff 64 Feb 5 17:36 history
-rw-r--r--@ 1 zhanyeha staff 0 Feb 5 17:36 0011.deltacommit.requested
-rw-r--r--@ 1 zhanyeha staff 0 Feb 5 17:36 0011.deltacommit.inflight
-rw-r--r--@ 1 zhanyeha staff 4314 Feb 5 17:36
0011_20250205173626508.deltacommit
-rw-r--r--@ 1 zhanyeha staff 0 Feb 5 17:36 0012.deltacommit.requested
-rw-r--r--@ 1 zhanyeha staff 2795 Feb 5 17:36 0012.deltacommit.inflight
-rw-r--r--@ 1 zhanyeha staff 4502 Feb 5 17:36
0012_20250205173628037.deltacommit
-rw-r--r--@ 1 zhanyeha staff 0 Feb 5 17:36 0021.deltacommit.requested
-rw-r--r--@ 1 zhanyeha staff 113 Feb 5 17:36 0021.deltacommit.inflight
-rw-r--r--@ 1 zhanyeha staff 0 Feb 5 17:36 0031.deltacommit.requested
-rw-r--r--@ 1 zhanyeha staff 113 Feb 5 17:36 0031.deltacommit.inflight
-rw-r--r--@ 1 zhanyeha staff 3186 Feb 5 17:36 0032.compaction.requested
-rw-r--r--@ 1 zhanyeha staff 3829 Feb 5 17:36
0021_20250205173628336.deltacommit
{code}
Error is projection step hits NPE. It is high chance that the compaction writer
schema and the data it handles mismatch, resulting in accessing non-existing
data fields.
{code:java}
at org.apache.spark.api.java.JavaRDDLike.collect$(JavaRDDLike.scala:361)
at
org.apache.spark.api.java.AbstractJavaRDDLike.collect(JavaRDDLike.scala:45)
at org.apache.hudi.data.HoodieJavaRDD.collectAsList(HoodieJavaRDD.java:200)
at
org.apache.hudi.table.action.compact.RunCompactionActionExecutor.execute(RunCompactionActionExecutor.java:113)
... 136 more
Caused by: java.lang.NullPointerException
at
org.apache.spark.sql.catalyst.expressions.codegen.UnsafeWriter.write(UnsafeWriter.java:110)
at
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.writeFields_0_4$(Unknown
Source)
at
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
Source)
at
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
Source)
at
org.apache.spark.sql.execution.datasources.RecordReaderIterator$$anon$1.next(RecordReaderIterator.scala:62)
at
org.apache.hudi.util.CloseableInternalRowIterator.next(CloseableInternalRowIterator.scala:57)
at
org.apache.hudi.util.CloseableInternalRowIterator.next(CloseableInternalRowIterator.scala:36)
at
org.apache.hudi.common.table.read.HoodieKeyBasedFileGroupRecordBuffer.doHasNext(HoodieKeyBasedFileGroupRecordBuffer.java:140)
at
org.apache.hudi.common.table.read.HoodieBaseFileGroupRecordBuffer.hasNext(HoodieBaseFileGroupRecordBuffer.java:160)
at
org.apache.hudi.common.table.read.HoodieFileGroupReader.hasNext(HoodieFileGroupReader.java:260)
at
org.apache.hudi.common.table.read.HoodieFileGroupReader$HoodieFileGroupReaderIterator.hasNext(HoodieFileGroupReader.java:331)
at
org.apache.hudi.io.HoodieSparkFileGroupReaderBasedMergeHandle.write(HoodieSparkFileGroupReaderBasedMergeHandle.java:203)
at
org.apache.hudi.table.HoodieSparkCopyOnWriteTable.compactUsingFileGroupReader(HoodieSparkCopyOnWriteTable.java:281)
at
org.apache.hudi.table.action.compact.HoodieCompactor.compact(HoodieCompactor.java:305)
at
org.apache.hudi.table.action.compact.HoodieCompactor.lambda$compact$8ace6636$1(HoodieCompactor.java:159)
{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)