CrystalCat commented on issue #9865:
URL: https://github.com/apache/hudi/issues/9865#issuecomment-1768765812

   @ad1happy2go Thank you for your reply.  
   run the SQL you posted. got ArrayIndexOutOfBoundsException too . comparing 
to mime found that my SQL insert all the data of events. thus all partition 
path created.  so I made a little change based on your SQL ,insert a new record 
to create partitoin path event=b
   ```sql
   drop table hudi_events;
   CREATE TABLE default.hudi_events (
     timestamp INT,
     visitorid INT,
     event STRING,
     itemid INT,
     transactionid INT
   ) USING HUDI
   PARTITIONED BY (event)
   TBLPROPERTIES (
     primaryKey = 'visitorid',
      preCombineField = 'timestamp',
      hoodie.index.type= 'GLOBAL_BLOOM',
     type = 'cow'
   );
   
   insert into hudi_events values (1,1,1,1,'a');
   --create partition path event=b
   insert into hudi_events values (1,1,1,1,'b');
   
   Drop table default.events_incremental;
   CREATE TABLE default.events_incremental (
     timestamp INT,
     visitorid INT,
     event STRING,
     itemid INT,
     transactionid INT
   ) USING PARQUET;
   
   
   insert into events_incremental values (1,1,'b',2,1);
   
   merge into hudi_events as target
   using events_incremental as source
   on target.timestamp = source.timestamp
   when matched then update set *
   when not matched then insert *
   ;
   
   ```
   evething works find without  an error.  In this case, update an exists 
partiton path should be fine. while update a non exists partition path will got 
ArrayIndexOutOfBoundsException.
   
   case 2#
   work though with my SQL again.
   1.create a table hudi_events
   2.insert full event data . 
   3.merge into with full event data. 
   4.select first 800000 of table events 
   5.merge this 800000 records into hudi_events
   
   step 2 create all the partition path already.
   step 3 merge with full event data without an error.
   step 4,5  merge with first 800000 records  after last mege operation . 
   
   case 3#
   So I perform the following steps:
   1.create a table hudi_events
   2.insert full event data . 
   3.select first 800000 of table events 
   4.merge this 800000 records into hudi_events
   ```sql
   Failed to upsert for commit time 20231018233057356
   org.apache.hudi.exception.HoodieUpsertException: Failed to upsert for commit 
time 20231018233057356
   .....
   23/10/18 23:34:19 WARN TaskSetManager: Lost task 18.0 in stage 668.0 (TID 
4999) (dm executor driver): TaskKilled (Stage cancelled)
        at 
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.buildProfile(BaseSparkCommitActionExecutor.java:200)
        at 
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.execute(BaseSparkCommitActionExecutor.java:174)
        at 
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.execute(BaseSparkCommitActionExecutor.java:86)
        at 
org.apache.hudi.table.action.commit.BaseWriteHelper.write(BaseWriteHelper.java:67)
        ... 66 more
   Caused by: java.lang.ArrayIndexOutOfBoundsException
   ```
   got ArrayIndexOutOfBoundsException again.
   
   
   compare caste 2# -> 3# .  only remove the step `merge into with full event 
data. ` why merge all events works fine ,merge first of 800000 events(subset of 
all events) went wrong.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to