Jonathan Vexler created HUDI-8328:
-------------------------------------

             Summary: Filegroup name seems incorrect for log file created with 
NBCC
                 Key: HUDI-8328
                 URL: https://issues.apache.org/jira/browse/HUDI-8328
             Project: Apache Hudi
          Issue Type: Bug
          Components: multi-writer
            Reporter: Jonathan Vexler
         Attachments: TestSparkNonBlockingConcurrencyControl.java, 
bulkInsertFirst=false.txt, bulkInsertFirst=true.txt

Test in here "testMultiBaseFile"

[^TestSparkNonBlockingConcurrencyControl.java]

 
bulkInsertFirst=true works fine, but the test will fail for 
bulkInsertFirst=false
 
This is because the name of the filegroup created by the bulk insert at the end 
seems to be wrong.

I have attached a copy of my terminal looking at the tables for both tests, but 
I have extracted the relevant info here so it is easier to read. Take a look at 
those files if think something looks wrong with the info below
 
 
Here is the timeline for bulkInsertFirst=true:
{code:java}
20241008155534129.deltacommit.inflight
20241008155534129.deltacommit.requested
20241008155534129_20241008155538371.deltacommit 
20241008155538785.deltacommit.inflight
20241008155538785.deltacommit.requested
20241008155538785_20241008155539942.deltacommit
20241008155539336.deltacommit.inflight
20241008155539336.deltacommit.requested   
20241008155539336_20241008155540151.deltacommit
20241008155540193.deltacommit.inflight
20241008155540193.deltacommit.requested
20241008155540193_20241008155540768.deltacommit {code}
And here are the files in the table:
{code:java}
.00000000-0000-0000-0000-000000000000-0_20241008155538785.log.1_0-24-34
.00000000-0000-0000-0000-000000000000-0_20241008155539336.log.1_0-30-45
.00000000-0000-0000-0000-000000000000-0_20241008155540193.log.1_0-50-74
00000000-0000-0000-0000-000000000000-0_0-12-14_20241008155534129.parquet {code}
 
Here is the timeline for bulkInsertFirst=false:
{code:java}
20241008155116873.deltacommit.inflight
20241008155116873.deltacommit.requested
20241008155116873_20241008155118089.deltacommit
20241008155117398.deltacommit.inflight 
20241008155117398.deltacommit.requested
20241008155117398_20241008155118282.deltacommit
20241008155118321.deltacommit.inflight
20241008155118321.deltacommit.requested
20241008155118321_20241008155118833.deltacommit {code}
And here are the files in the table:

{code:java}
.00000000-0000-0000-0000-000000000000_20241008155116873.log.1_0-71-102
.00000000-0000-0000-0000-000000000000_20241008155117398.log.1_0-77-113
 {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to