Re: [I] [SUPPORT] merge into hudi table with ArrayIndexOutOfBoundsException error [hudi]

via GitHub Wed, 18 Oct 2023 08:45:05 -0700


CrystalCat commented on issue #9865:
URL: https://github.com/apache/hudi/issues/9865#issuecomment-1768765812

@ad1happy2go Thank you for your reply.
run the SQL you posted. got ArrayIndexOutOfBoundsException too . comparing
to mime found that my SQL insert all the data of events. thus all partition
path created. so I made a little change based on your SQL ,insert a new record
to create partitoin path event=b
```sql
drop table hudi_events;
CREATE TABLE default.hudi_events (
timestamp INT,
visitorid INT,
event STRING,
itemid INT,
transactionid INT
) USING HUDI
PARTITIONED BY (event)
TBLPROPERTIES (
primaryKey = 'visitorid',
preCombineField = 'timestamp',
hoodie.index.type= 'GLOBAL_BLOOM',
type = 'cow'
);

insert into hudi_events values (1,1,1,1,'a');
--create partition path event=b
insert into hudi_events values (1,1,1,1,'b');

Drop table default.events_incremental;
CREATE TABLE default.events_incremental (
timestamp INT,
visitorid INT,
event STRING,
itemid INT,
transactionid INT
) USING PARQUET;

insert into events_incremental values (1,1,'b',2,1);

merge into hudi_events as target
using events_incremental as source
on target.timestamp = source.timestamp
when matched then update set *
when not matched then insert *
;

```
evething works find without an error. In this case, update an exists
partiton path should be fine. while update a non exists partition path will got
ArrayIndexOutOfBoundsException.

case 2#
work though with my SQL again.
1.create a table hudi_events
2.insert full event data .
3.merge into with full event data.
4.select first 800000 of table events
5.merge this 800000 records into hudi_events

step 2 create all the partition path already.
step 3 merge with full event data without an error.
step 4,5 merge with first 800000 records after last mege operation .

case 3#
So I perform the following steps:
1.create a table hudi_events
2.insert full event data .
3.select first 800000 of table events
4.merge this 800000 records into hudi_events
```sql
Failed to upsert for commit time 20231018233057356
org.apache.hudi.exception.HoodieUpsertException: Failed to upsert for commit
time 20231018233057356
.....
23/10/18 23:34:19 WARN TaskSetManager: Lost task 18.0 in stage 668.0 (TID
4999) (dm executor driver): TaskKilled (Stage cancelled)
at
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.buildProfile(BaseSparkCommitActionExecutor.java:200)
at
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.execute(BaseSparkCommitActionExecutor.java:174)
at
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.execute(BaseSparkCommitActionExecutor.java:86)
at
org.apache.hudi.table.action.commit.BaseWriteHelper.write(BaseWriteHelper.java:67)
... 66 more
Caused by: java.lang.ArrayIndexOutOfBoundsException
```
got ArrayIndexOutOfBoundsException again.

compare caste 2# -> 3# . only remove the step `merge into with full event
data. ` why merge all events works fine ,merge first of 800000 events(subset of
all events) went wrong.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [SUPPORT] merge into hudi table with ArrayIndexOutOfBoundsException error [hudi]

Reply via email to