[
https://issues.apache.org/jira/browse/HUDI-8328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Y Ethan Guo updated HUDI-8328:
------------------------------
Fix Version/s: 1.1.0
(was: 1.0.0)
> Filegroup name seems incorrect for log file created with NBCC
> -------------------------------------------------------------
>
> Key: HUDI-8328
> URL: https://issues.apache.org/jira/browse/HUDI-8328
> Project: Apache Hudi
> Issue Type: Bug
> Components: multi-writer
> Reporter: Jonathan Vexler
> Assignee: xi chaomin
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.1.0
>
> Attachments: TestSparkNonBlockingConcurrencyControl.java,
> bulkInsertFirst=false.txt, bulkInsertFirst=true.txt
>
>
> Test in here "testMultiBaseFile":
> [^TestSparkNonBlockingConcurrencyControl.java]
>
> bulkInsertFirst=true works fine, but the test will fail for
> bulkInsertFirst=false
>
> This is because the name of the filegroup created by the bulk insert at the
> end seems to be wrong.
> I have attached a copy of my terminal looking at the tables for both tests,
> but I have extracted the relevant info here so it is easier to read. Take a
> look at those files if think something looks wrong with the info below
>
>
> Here is the timeline for bulkInsertFirst=true:
> In this case, we do a bulk insert, 2 overlapping upserts, then a bulk insert
> {code:java}
> 20241008155534129.deltacommit.inflight
> 20241008155534129.deltacommit.requested
> 20241008155534129_20241008155538371.deltacommit
> 20241008155538785.deltacommit.inflight
> 20241008155538785.deltacommit.requested
> 20241008155538785_20241008155539942.deltacommit
> 20241008155539336.deltacommit.inflight
> 20241008155539336.deltacommit.requested
> 20241008155539336_20241008155540151.deltacommit
> 20241008155540193.deltacommit.inflight
> 20241008155540193.deltacommit.requested
> 20241008155540193_20241008155540768.deltacommit {code}
> And here are the files in the table:
> {code:java}
> .00000000-0000-0000-0000-000000000000-0_20241008155538785.log.1_0-24-34
> .00000000-0000-0000-0000-000000000000-0_20241008155539336.log.1_0-30-45
> .00000000-0000-0000-0000-000000000000-0_20241008155540193.log.1_0-50-74
> 00000000-0000-0000-0000-000000000000-0_0-12-14_20241008155534129.parquet
> {code}
>
> Here is the timeline for bulkInsertFirst=false:
> in this case we do 2 overlapping upserts, then a bulk insert
> {code:java}
> 20241008155116873.deltacommit.inflight
> 20241008155116873.deltacommit.requested
> 20241008155116873_20241008155118089.deltacommit
> 20241008155117398.deltacommit.inflight
> 20241008155117398.deltacommit.requested
> 20241008155117398_20241008155118282.deltacommit
> 20241008155118321.deltacommit.inflight
> 20241008155118321.deltacommit.requested
> 20241008155118321_20241008155118833.deltacommit {code}
> And here are the files in the table:
> {code:java}
> .00000000-0000-0000-0000-000000000000_20241008155116873.log.1_0-71-102
> .00000000-0000-0000-0000-000000000000_20241008155117398.log.1_0-77-113
> .00000000-0000-0000-0000-0_20241008155118321.log.1_0-97-142{code}
> As you can see, the third log file here looks different than all the rest
--
This message was sent by Atlassian Jira
(v8.20.10#820010)