juntaozhang opened a new issue, #7299:
URL: https://github.com/apache/paimon/issues/7299

   ### Search before asking
   
   - [x] I searched in the [issues](https://github.com/apache/paimon/issues) 
and found nothing similar.
   
   
   ### Paimon version
   
   main
   
   ### Compute Engine
   
   spark 3.5
   
   ### Minimal reproduce step
   
   ```
   spark-sql (default)> CREATE TABLE default.t (
                      >     `t1` string ,
                      >     `t2` string ,
                      >     `t3` string
                      > ) PARTITIONED BY (`date` string)
                      > TBLPROPERTIES (
                      >   'chain-table.enabled' = 'true',
                      >   -- props about primary key table
                      >   'primary-key' = 'date,t1',
                      >   'sequence.field' = 't2',
                      >   'bucket-key' = 't1',
                      >   'bucket' = '2',
                      >   -- props about partition
                      >   'partition.timestamp-pattern' = '$date',
                      >   'partition.timestamp-formatter' = 'yyyyMMdd'
                      > );
   26/02/24 13:05:12 WARN Mimetypes: Unable to find 'mime.types' file in 
classpath
   Time taken: 0.823 seconds
   spark-sql (default)> CALL sys.create_branch('default.t', 'snapshot');
   true
   Time taken: 0.725 seconds, Fetched 1 row(s)
   spark-sql (default)>
                      > CALL sys.create_branch('default.t', 'delta');
   true
   Time taken: 0.441 seconds, Fetched 1 row(s)
   spark-sql (default)> ALTER TABLE default.t SET tblproperties
                      >     ('scan.fallback-snapshot-branch' = 'snapshot',
                      >      'scan.fallback-delta-branch' = 'delta');
   Time taken: 0.961 seconds
   spark-sql (default)>
                      > ALTER TABLE `default`.`t$branch_snapshot` SET 
tblproperties
                      >     ('scan.fallback-snapshot-branch' = 'snapshot',
                      >      'scan.fallback-delta-branch' = 'delta');
   Time taken: 0.667 seconds
   spark-sql (default)>
                      > ALTER TABLE `default`.`t$branch_delta` SET tblproperties
                      >     ('scan.fallback-snapshot-branch' = 'snapshot',
                      >      'scan.fallback-delta-branch' = 'delta');
   Time taken: 0.954 seconds
   spark-sql (default)> insert overwrite `default`.`t$branch_snapshot` 
partition (date = '20250810')
                      >     values ('1', '1', '1');
   Time taken: 24.562 seconds
   spark-sql (default)> insert overwrite `default`.`t$branch_delta` partition 
(date = '20250811')
                      >     values ('2', '1', '1');
   Time taken: 21.339 seconds
   spark-sql (default)>
                      > select t1, t2, t3 from default.t where date = '20250811'
                      > ;
   26/02/24 13:06:21 ERROR TaskSetManager: Failed to serialize task 4, not 
attempting to retry it.
   java.lang.NullPointerException
        at java.base/java.io.DataOutputStream.writeUTF(Unknown Source)
        at java.base/java.io.DataOutputStream.writeUTF(Unknown Source)
        at 
org.apache.paimon.table.source.ChainSplit.serialize(ChainSplit.java:146)
        at 
org.apache.paimon.table.source.ChainSplit.writeObject(ChainSplit.java:115)
        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown 
Source)
        at java.base/java.lang.reflect.Method.invoke(Unknown Source)
        at java.base/java.io.ObjectStreamClass.invokeWriteObject(Unknown Source)
        at java.base/java.io.ObjectOutputStream.writeSerialData(Unknown Source)
        at java.base/java.io.ObjectOutputStream.writeOrdinaryObject(Unknown 
Source)
        at java.base/java.io.ObjectOutputStream.writeObject0(Unknown Source)
        at java.base/java.io.ObjectOutputStream.defaultWriteFields(Unknown 
Source)
        at java.base/java.io.ObjectOutputStream.writeSerialData(Unknown Source)
        at java.base/java.io.ObjectOutputStream.writeOrdinaryObject(Unknown 
Source)
        at java.base/java.io.ObjectOutputStream.writeObject0(Unknown Source)
        at java.base/java.io.ObjectOutputStream.writeObject(Unknown Source)
        at 
scala.collection.immutable.List$SerializationProxy.writeObject(List.scala:516)
        at jdk.internal.reflect.GeneratedMethodAccessor130.invoke(Unknown 
Source)
        at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown 
Source)
        at java.base/java.lang.reflect.Method.invoke(Unknown Source)
        at java.base/java.io.ObjectStreamClass.invokeWriteObject(Unknown Source)
        at java.base/java.io.ObjectOutputStream.writeSerialData(Unknown Source)
        at java.base/java.io.ObjectOutputStream.writeOrdinaryObject(Unknown 
Source)
        at java.base/java.io.ObjectOutputStream.writeObject0(Unknown Source)
        at java.base/java.io.ObjectOutputStream.defaultWriteFields(Unknown 
Source)
        at java.base/java.io.ObjectOutputStream.writeSerialData(Unknown Source)
        at java.base/java.io.ObjectOutputStream.writeOrdinaryObject(Unknown 
Source)
        at java.base/java.io.ObjectOutputStream.writeObject0(Unknown Source)
        at java.base/java.io.ObjectOutputStream.writeObject(Unknown Source)
        at 
scala.collection.immutable.List$SerializationProxy.writeObject(List.scala:516)
        at jdk.internal.reflect.GeneratedMethodAccessor130.invoke(Unknown 
Source)
        at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown 
Source)
        at java.base/java.lang.reflect.Method.invoke(Unknown Source)
        at java.base/java.io.ObjectStreamClass.invokeWriteObject(Unknown Source)
        at java.base/java.io.ObjectOutputStream.writeSerialData(Unknown Source)
        at java.base/java.io.ObjectOutputStream.writeOrdinaryObject(Unknown 
Source)
        at java.base/java.io.ObjectOutputStream.writeObject0(Unknown Source)
        at java.base/java.io.ObjectOutputStream.defaultWriteFields(Unknown 
Source)
        at java.base/java.io.ObjectOutputStream.writeSerialData(Unknown Source)
        at java.base/java.io.ObjectOutputStream.writeOrdinaryObject(Unknown 
Source)
        at java.base/java.io.ObjectOutputStream.writeObject0(Unknown Source)
        at java.base/java.io.ObjectOutputStream.defaultWriteFields(Unknown 
Source)
        at java.base/java.io.ObjectOutputStream.writeSerialData(Unknown Source)
        at java.base/java.io.ObjectOutputStream.writeOrdinaryObject(Unknown 
Source)
        at java.base/java.io.ObjectOutputStream.writeObject0(Unknown Source)
        at java.base/java.io.ObjectOutputStream.writeObject(Unknown Source)
        at 
org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46)
        at 
org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:115)
        at 
org.apache.spark.scheduler.TaskSetManager.prepareLaunchingTask(TaskSetManager.scala:530)
        at 
org.apache.spark.scheduler.TaskSetManager.$anonfun$resourceOffer$2(TaskSetManager.scala:494)
        at scala.Option.map(Option.scala:230)
        at 
org.apache.spark.scheduler.TaskSetManager.resourceOffer(TaskSetManager.scala:470)
        at 
org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOfferSingleTaskSet$2(TaskSchedulerImpl.scala:414)
        at 
org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOfferSingleTaskSet$2$adapted(TaskSchedulerImpl.scala:409)
        at scala.Option.foreach(Option.scala:407)
        at 
org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOfferSingleTaskSet$1(TaskSchedulerImpl.scala:409)
        at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158)
        at 
org.apache.spark.scheduler.TaskSchedulerImpl.resourceOfferSingleTaskSet(TaskSchedulerImpl.scala:399)
        at 
org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOffers$20(TaskSchedulerImpl.scala:606)
        at 
org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOffers$20$adapted(TaskSchedulerImpl.scala:601)
        at 
scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
        at 
scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
        at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
        at 
org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOffers$16(TaskSchedulerImpl.scala:601)
        at 
org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOffers$16$adapted(TaskSchedulerImpl.scala:574)
        at 
scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
        at 
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
        at 
org.apache.spark.scheduler.TaskSchedulerImpl.resourceOffers(TaskSchedulerImpl.scala:574)
        at 
org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint.$anonfun$makeOffers$1(CoarseGrainedSchedulerBackend.scala:366)
        at 
org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.org$apache$spark$scheduler$cluster$CoarseGrainedSchedulerBackend$$withLock(CoarseGrainedSchedulerBackend.scala:1058)
        at 
org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint.org$apache$spark$scheduler$cluster$CoarseGrainedSchedulerBackend$DriverEndpoint$$makeOffers(CoarseGrainedSchedulerBackend.scala:360)
        at 
org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint$$anonfun$receive$1.applyOrElse(CoarseGrainedSchedulerBackend.scala:188)
        at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115)
        at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213)
        at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
        at 
org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75)
        at 
org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown 
Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown 
Source)
        at java.base/java.lang.Thread.run(Unknown Source)
   26/02/24 13:06:21 ERROR TaskSchedulerImpl: Resource offer failed, task set 
TaskSet_6.0 was not serializable
   Job aborted due to stage failure: Failed to serialize task 4, not attempting 
to retry it. Exception during serialization: java.lang.NullPointerException
   org.apache.spark.SparkException: Job aborted due to stage failure: Failed to 
serialize task 4, not attempting to retry it. Exception during serialization: 
java.lang.NullPointerException
        at 
org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2856)
        at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2792)
        at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2791)
        at 
scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
        at 
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
        at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2791)
        at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1247)
        at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1247)
        at scala.Option.foreach(Option.scala:407)
        at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1247)
        at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:3060)
        at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2994)
        at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2983)
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
        at 
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:989)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:2393)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:2414)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:2433)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:2458)
        at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1049)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:410)
        at org.apache.spark.rdd.RDD.collect(RDD.scala:1048)
        at 
org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:448)
        at 
org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:475)
        at 
org.apache.spark.sql.execution.HiveResult$.hiveResultString(HiveResult.scala:76)
        at 
org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.$anonfun$run$2(SparkSQLDriver.scala:76)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:125)
        at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:201)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:108)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
        at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:66)
        at 
org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:76)
        at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:501)
        at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1(SparkSQLCLIDriver.scala:619)
        at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1$adapted(SparkSQLCLIDriver.scala:613)
        at scala.collection.Iterator.foreach(Iterator.scala:943)
        at scala.collection.Iterator.foreach$(Iterator.scala:943)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
        at scala.collection.IterableLike.foreach(IterableLike.scala:74)
        at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
        at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
        at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processLine(SparkSQLCLIDriver.scala:613)
        at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:310)
        at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown 
Source)
        at java.base/java.lang.reflect.Method.invoke(Unknown Source)
        at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1034)
        at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:199)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:222)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
        at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1125)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1134)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
   ```
   
   ### What doesn't meet your expectations?
   
   ```
   spark-sql (default)> select t1, t2, t3 from default.t where date = 
'20250811';
   1    1       1
   2    1       1
   Time taken: 6.136 seconds, Fetched 2 row(s)
   ```
   
   ### Anything else?
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [x] I'm willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to