Davis Zhang created HUDI-8963:
---------------------------------

             Summary: Multi-writer schema evolution interleaved with compaction 
can have issues
                 Key: HUDI-8963
                 URL: https://issues.apache.org/jira/browse/HUDI-8963
             Project: Apache Hudi
          Issue Type: Bug
            Reporter: Davis Zhang


first delta commit use schema 1 and committed

second delta commit use schema 2 (valid schema evolution) and goes inflight

compaction.request

second delta commit finishes

compaction execution hit issues
{code:java}
drwxr-xr-x@ 2 zhanyeha  staff    64 Feb  5 17:36 history
-rw-r--r--@ 1 zhanyeha  staff     0 Feb  5 17:36 0011.deltacommit.requested
-rw-r--r--@ 1 zhanyeha  staff     0 Feb  5 17:36 0011.deltacommit.inflight
-rw-r--r--@ 1 zhanyeha  staff  4314 Feb  5 17:36 
0011_20250205173626508.deltacommit
-rw-r--r--@ 1 zhanyeha  staff     0 Feb  5 17:36 0012.deltacommit.requested
-rw-r--r--@ 1 zhanyeha  staff  2795 Feb  5 17:36 0012.deltacommit.inflight
-rw-r--r--@ 1 zhanyeha  staff  4502 Feb  5 17:36 
0012_20250205173628037.deltacommit
-rw-r--r--@ 1 zhanyeha  staff     0 Feb  5 17:36 0021.deltacommit.requested
-rw-r--r--@ 1 zhanyeha  staff   113 Feb  5 17:36 0021.deltacommit.inflight
-rw-r--r--@ 1 zhanyeha  staff     0 Feb  5 17:36 0031.deltacommit.requested
-rw-r--r--@ 1 zhanyeha  staff   113 Feb  5 17:36 0031.deltacommit.inflight
-rw-r--r--@ 1 zhanyeha  staff  3186 Feb  5 17:36 0032.compaction.requested
-rw-r--r--@ 1 zhanyeha  staff  3829 Feb  5 17:36 
0021_20250205173628336.deltacommit
 {code}
Error is projection step hits NPE. It is high chance that the compaction writer 
schema and the data it handles mismatch, resulting in accessing non-existing 
data fields.

 
{code:java}
    at org.apache.spark.api.java.JavaRDDLike.collect$(JavaRDDLike.scala:361)
    at 
org.apache.spark.api.java.AbstractJavaRDDLike.collect(JavaRDDLike.scala:45)
    at org.apache.hudi.data.HoodieJavaRDD.collectAsList(HoodieJavaRDD.java:200)
    at 
org.apache.hudi.table.action.compact.RunCompactionActionExecutor.execute(RunCompactionActionExecutor.java:113)
    ... 136 more
Caused by: java.lang.NullPointerException
    at 
org.apache.spark.sql.catalyst.expressions.codegen.UnsafeWriter.write(UnsafeWriter.java:110)
    at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.writeFields_0_4$(Unknown
 Source)
    at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
 Source)
    at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
 Source)
    at 
org.apache.spark.sql.execution.datasources.RecordReaderIterator$$anon$1.next(RecordReaderIterator.scala:62)
    at 
org.apache.hudi.util.CloseableInternalRowIterator.next(CloseableInternalRowIterator.scala:57)
    at 
org.apache.hudi.util.CloseableInternalRowIterator.next(CloseableInternalRowIterator.scala:36)
    at 
org.apache.hudi.common.table.read.HoodieKeyBasedFileGroupRecordBuffer.doHasNext(HoodieKeyBasedFileGroupRecordBuffer.java:140)
    at 
org.apache.hudi.common.table.read.HoodieBaseFileGroupRecordBuffer.hasNext(HoodieBaseFileGroupRecordBuffer.java:160)
    at 
org.apache.hudi.common.table.read.HoodieFileGroupReader.hasNext(HoodieFileGroupReader.java:260)
    at 
org.apache.hudi.common.table.read.HoodieFileGroupReader$HoodieFileGroupReaderIterator.hasNext(HoodieFileGroupReader.java:331)
    at 
org.apache.hudi.io.HoodieSparkFileGroupReaderBasedMergeHandle.write(HoodieSparkFileGroupReaderBasedMergeHandle.java:203)
    at 
org.apache.hudi.table.HoodieSparkCopyOnWriteTable.compactUsingFileGroupReader(HoodieSparkCopyOnWriteTable.java:281)
    at 
org.apache.hudi.table.action.compact.HoodieCompactor.compact(HoodieCompactor.java:305)
    at 
org.apache.hudi.table.action.compact.HoodieCompactor.lambda$compact$8ace6636$1(HoodieCompactor.java:159)
     {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to