Jonathan Vexler created HUDI-8328:
-------------------------------------
Summary: Filegroup name seems incorrect for log file created with
NBCC
Key: HUDI-8328
URL: https://issues.apache.org/jira/browse/HUDI-8328
Project: Apache Hudi
Issue Type: Bug
Components: multi-writer
Reporter: Jonathan Vexler
Attachments: TestSparkNonBlockingConcurrencyControl.java,
bulkInsertFirst=false.txt, bulkInsertFirst=true.txt
Test in here "testMultiBaseFile"
[^TestSparkNonBlockingConcurrencyControl.java]
bulkInsertFirst=true works fine, but the test will fail for
bulkInsertFirst=false
This is because the name of the filegroup created by the bulk insert at the end
seems to be wrong.
I have attached a copy of my terminal looking at the tables for both tests, but
I have extracted the relevant info here so it is easier to read. Take a look at
those files if think something looks wrong with the info below
Here is the timeline for bulkInsertFirst=true:
{code:java}
20241008155534129.deltacommit.inflight
20241008155534129.deltacommit.requested
20241008155534129_20241008155538371.deltacommit
20241008155538785.deltacommit.inflight
20241008155538785.deltacommit.requested
20241008155538785_20241008155539942.deltacommit
20241008155539336.deltacommit.inflight
20241008155539336.deltacommit.requested
20241008155539336_20241008155540151.deltacommit
20241008155540193.deltacommit.inflight
20241008155540193.deltacommit.requested
20241008155540193_20241008155540768.deltacommit {code}
And here are the files in the table:
{code:java}
.00000000-0000-0000-0000-000000000000-0_20241008155538785.log.1_0-24-34
.00000000-0000-0000-0000-000000000000-0_20241008155539336.log.1_0-30-45
.00000000-0000-0000-0000-000000000000-0_20241008155540193.log.1_0-50-74
00000000-0000-0000-0000-000000000000-0_0-12-14_20241008155534129.parquet {code}
Here is the timeline for bulkInsertFirst=false:
{code:java}
20241008155116873.deltacommit.inflight
20241008155116873.deltacommit.requested
20241008155116873_20241008155118089.deltacommit
20241008155117398.deltacommit.inflight
20241008155117398.deltacommit.requested
20241008155117398_20241008155118282.deltacommit
20241008155118321.deltacommit.inflight
20241008155118321.deltacommit.requested
20241008155118321_20241008155118833.deltacommit {code}
And here are the files in the table:
{code:java}
.00000000-0000-0000-0000-000000000000_20241008155116873.log.1_0-71-102
.00000000-0000-0000-0000-000000000000_20241008155117398.log.1_0-77-113
{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)