jiangok2006 opened a new issue #2437:
URL: https://github.com/apache/hudi/issues/2437
hi, my long running delta streamer failed due to:
```
Caused by: org.apache.hudi.exception.HoodieUpsertException: Error upserting
bucketType UPDATE for partition :17
at
org.apache.hudi.table.action.commit.BaseCommitActionExecutor.handleUpsertPartition(BaseCommitActionExecutor.java:264)
at
org.apache.hudi.table.action.commit.BaseCommitActionExecutor.lambda$execute$caffe4c4$1(BaseCommitActionExecutor.java:97)
at
org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
at
org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
at
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853)
at
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:337)
at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:335)
at
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1182)
at
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
at
org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
at
org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
... 3 more
Caused by: java.lang.ArrayIndexOutOfBoundsException
```
This is the command:
```
spark-submit --packages
org.apache.spark:spark-avro_2.11:2.4.4,org.apache.hudi:hudi-utilities-bundle_2.11:0.6.0
\
--class
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer \
https://repo1.maven.org/maven2/org/apache/hudi/hudi-utilities-bundle_2.11/0.6.0/hudi-utilities-bundle_2.11-0.6.0.jar
\
--schemaprovider-class
org.apache.hudi.utilities.schema.SchemaRegistryProvider \
--props /tmp/kafka-source.properties \
--source-class
org.apache.hudi.utilities.sources.AvroKafkaSource \
--table-type COPY_ON_WRITE \
--target-base-path s3://hudi/output \
--target-table output \
--op UPSERT \
--continuous \
--min-sync-interval-seconds 300 \
--source-ordering-field processingTime \
--hoodie-conf hoodie.datasource.write.recordkey.field=id \
--hoodie-conf
hoodie.datasource.write.partitionpath.field=partitionId \
--hoodie-conf bootstrap.servers=kafka.net:6020 \
--hoodie-conf sasl.mechanism=SCRAM-SHA-256 \
--hoodie-conf security.protocol=SASL_SSL \
--hoodie-conf
sasl.jaas.config="org.apache.kafka.common.security.scram.ScramLoginModule
required username=\"USERNAME\" password=\"PASSWORD\";" \
--hoodie-conf
hoodie.deltastreamer.source.kafka.topic=mytopic \
--hoodie-conf
schema.registry.url=https://schema-registry.net:443 \
--hoodie-conf
hoodie.deltastreamer.schemaprovider.registry.url=https://schema-registry.net/subjects/com.myschema/versions/latest
```
This error happens intermittently (e.g every several minutes) to crash
deltastreamer. Sometimes the complained partition does not exist but sometimes
it does. It should not be caused by schema change. Thanks for any clue.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]