jiangok2006 opened a new issue #2437:
URL: https://github.com/apache/hudi/issues/2437


   hi, my long running delta streamer failed due to:
   
   ```
   Caused by: org.apache.hudi.exception.HoodieUpsertException: Error upserting 
bucketType UPDATE for partition :17
        at 
org.apache.hudi.table.action.commit.BaseCommitActionExecutor.handleUpsertPartition(BaseCommitActionExecutor.java:264)
        at 
org.apache.hudi.table.action.commit.BaseCommitActionExecutor.lambda$execute$caffe4c4$1(BaseCommitActionExecutor.java:97)
        at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
        at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
        at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853)
        at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
        at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:337)
        at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:335)
        at 
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1182)
        at 
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156)
        at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
        at 
org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
        at 
org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
        at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
        at org.apache.spark.scheduler.Task.run(Task.scala:123)
        at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
        ... 3 more
   Caused by: java.lang.ArrayIndexOutOfBoundsException
   ```
   
   This is the command:
   
   ```
   spark-submit --packages 
org.apache.spark:spark-avro_2.11:2.4.4,org.apache.hudi:hudi-utilities-bundle_2.11:0.6.0
 \
                   --class 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer \
                   
https://repo1.maven.org/maven2/org/apache/hudi/hudi-utilities-bundle_2.11/0.6.0/hudi-utilities-bundle_2.11-0.6.0.jar
 \
                   --schemaprovider-class 
org.apache.hudi.utilities.schema.SchemaRegistryProvider \
                   --props /tmp/kafka-source.properties \
                   --source-class 
org.apache.hudi.utilities.sources.AvroKafkaSource \
                   --table-type COPY_ON_WRITE \
                   --target-base-path s3://hudi/output \
                   --target-table output \
                   --op UPSERT \
                   --continuous \
                   --min-sync-interval-seconds 300 \
                   --source-ordering-field processingTime \
                   --hoodie-conf hoodie.datasource.write.recordkey.field=id \
                   --hoodie-conf  
hoodie.datasource.write.partitionpath.field=partitionId \
                   --hoodie-conf  bootstrap.servers=kafka.net:6020 \
                   --hoodie-conf  sasl.mechanism=SCRAM-SHA-256 \
                   --hoodie-conf  security.protocol=SASL_SSL \
                   --hoodie-conf  
sasl.jaas.config="org.apache.kafka.common.security.scram.ScramLoginModule       
          required username=\"USERNAME\" password=\"PASSWORD\";" \
                   --hoodie-conf  
hoodie.deltastreamer.source.kafka.topic=mytopic \
                   --hoodie-conf  
schema.registry.url=https://schema-registry.net:443 \
                   --hoodie-conf  
hoodie.deltastreamer.schemaprovider.registry.url=https://schema-registry.net/subjects/com.myschema/versions/latest
   ```
   
   This error happens intermittently (e.g every several minutes) to crash 
deltastreamer. Sometimes the complained partition does not exist but sometimes 
it does. It should not be caused by schema change. Thanks for any clue.
                   
                   
                   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to