prashanthpdesai opened a new issue #1695: URL: https://github.com/apache/hudi/issues/1695
**_Tips before filing an issue_** - Have you gone through our [FAQs](https://cwiki.apache.org/confluence/display/HUDI/FAQ)? yes - Join the mailing list to engage in conversations and get faster support at [email protected]. - If you have triaged this as a bug, then file an [issue](https://issues.apache.org/jira/projects/HUDI/issues) directly. **Describe the problem you faced** We are trying to use GLOBAL_BLOOM index for our use case in production , prior to that we trying it using test data using spark-shell ,facing below exception with below configuration. In the first run with test3.csv file gets successfully ingested by creating 3 partitions with ts column. in the second run we did dummy update only the partition column , by keeping other attributes same . please check test4.csv file below. Second run would not able to create a new partition with new ts value , getting the exception . **Run1 :** Input csv file: cat test3.csv fanme,lname,ts,uuid pd,desai1,2019-10-15,10 pp,sai,2019-10-14,11 pp,sai,2019-10-14,11 prabil,bal,2020-01-30,20 scala> val table ="hudi_cow1" scala> val basepath="/datalake/globalndextest" scala> val df3=spark.read.option("header","true").csv("/datalake/888/test3.csv" scala> val dfh4=df3.write.format("org.apache.hudi").option(RECORDKEY_FIELD_OPT_KEY, "uuid").option(PARTITIONPATH_FIELD_OPT_KEY,"ts").option("hoodie.index.type","GLOBAL_BLOOM").option("hoodie.bloom.index.update.partition.path","true").option(TABLE_NAME,table) scala> dfh4.mode(Append).save(basepath) Ouput: spark.read.parquet("/datalake/globalndextest/*").show(false) +-------------------+--------------------+------------------+----------------------+------------------------------------------------------------------------+------+------+----------+----+ |_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path|_hoodie_file_name|fanme |lname |ts |uuid| |20200529115326 |20200529115326_0_1 |20 |2020-01-30|86941ae8-a5b6-4d31-b12c-63ec2883a2d3-0_0-52-25756_20200529115326.parquet|prabil|bal |2020-01-30|20 | |20200529115326 |20200529115326_2_2 |10 |2019-10-15|972ee074-942c-417c-93ca-08d18b4e5897-0_2-52-25758_20200529115326.parquet|pd |desai1|2019-10-15|10 | |20200529115326 |20200529115326_1_1 |11 |2019-10-14|7df81054-b67b-402e-bc7e-c935a18ab3eb-0_1-52-25757_20200529115326.parquet|pp |sai |2019-10-14|11 | **Run2:** Input file: cat test4.csv fanme,lname,ts,uuid pd,desai,2019-10-17,10 pp,sai,2019-10-18,11 rg,fg,2019-10-18,25 scala> val table ="hudi_cow1" scala> val basepath="/datalake/globalndextest" scala> val df3=spark.read.option("header","true").csv("/datalake/888/test4.csv" scala> val dfh4=df3.write.format("org.apache.hudi").option(RECORDKEY_FIELD_OPT_KEY, "uuid").option(PARTITIONPATH_FIELD_OPT_KEY,"ts").option("hoodie.index.type","GLOBAL_BLOOM").option("hoodie.bloom.index.update.partition.path","true").option(TABLE_NAME,table) scala> dfh4.mode(Append).save(basepath) Exception: **Caused by: org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType UPDATE for partition :1** org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:264) org.apache.hudi.HoodieWriteClient.lambda$upsertRecordsInternal$507693af$1(HoodieWriteClient.java:428) org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102) org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102) org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:337) at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:335) at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1182) at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156) at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091) at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156) at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882) at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335) at org.apache.spark.rdd.RDD.iterator(RDD.scala:286) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:123) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.org$apache$spark$executor$Executor$TaskRunner$$anonfun$$res$1(Executor.scala:412) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:419) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1359) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:430) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) **Caused by: java.util.NoSuchElementException: No value present in Option** at org.apache.hudi.common.util.Option.get(Option.java:88) at org.apache.hudi.io.HoodieMergeHandle.<init>(HoodieMergeHandle.java:74)at org.apache.hudi.table.HoodieCopyOnWriteTable.getUpdateHandle(HoodieCopyOnWriteTable.java:220)at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:177) at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:257) **To Reproduce** Steps to reproduce the behavior: 1. 2. 3. 4. **Expected behavior** Our expectation is the since the record key column is same with new partition value it has to delete the previous record key value ( hoodie.bloom.index.update.partition.path","true") and create a record in new partition. In our case we have same record key coming in different partition in different runs , need to delete the record in previous partition and ingest the incoming record key into new partition **Environment Description** * Hudi version : 0.5.1 * Spark version : 2.2.1 * Hive version : * Hadoop version : 2.7 * Storage (HDFS/S3/GCS..) : HDFS * Running on Docker? (yes/no) : no **Additional context** **_In our case we have same record key coming in different partition in different runs , need to delete the record in previous partition and ingest the incoming record key into new partition._** **Stacktrace** ```Add the stacktrace of the error.``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
