Hans-Raintree commented on issue #13827:
URL: https://github.com/apache/hudi/issues/13827#issuecomment-3249465200

   Hey @cshuo,
   
   I tried with parallelism 3:
   
   ```
   SNAPSHOT_COUNT=3
   INCREMENTAL_COUNT=2
   
+-------------------+---------------------+------------------+----------------------+--------------------------------------+---+-------+-----+----+
   |_hoodie_commit_time|_hoodie_commit_seqno 
|_hoodie_record_key|_hoodie_partition_path|_hoodie_file_name                    
 |ts |uuid   |rider|city|
   
+-------------------+---------------------+------------------+----------------------+--------------------------------------+---+-------+-----+----+
   |20250903135712613  |20250903135712613_1_8|id-0001           |sf             
       |00000000-0000-0000-0000-000000000000-0|1  |id-0001|r11  |sf  |
   |20250903135712613  |20250903135712613_0_7|id-0002           |nyc            
       |00000000-0000-0000-0000-000000000000-0|2  |id-0002|r21  |nyc |
   |20250903135712613  |20250903135712613_1_9|id-0003           |la             
       |00000000-0000-0000-0000-000000000000-0|3  |id-0003|r31  |la  |
   
+-------------------+---------------------+------------------+----------------------+--------------------------------------+---+-------+-----+----+
   
   
+-------------------+---------------------+------------------+----------------------+--------------------------------------+---+-------+-----+----+
   |_hoodie_commit_time|_hoodie_commit_seqno 
|_hoodie_record_key|_hoodie_partition_path|_hoodie_file_name                    
 |ts |uuid   |rider|city|
   
+-------------------+---------------------+------------------+----------------------+--------------------------------------+---+-------+-----+----+
   |20250903135712613  |20250903135712613_1_8|id-0001           |sf             
       |00000000-0000-0000-0000-000000000000-0|1  |id-0001|r11  |sf  |
   |20250903135712613  |20250903135712613_0_7|id-0002           |nyc            
       |00000000-0000-0000-0000-000000000000-0|2  |id-0002|r21  |nyc |
   
+-------------------+---------------------+------------------+----------------------+--------------------------------------+---+-------+-----+----+
   
   sf\.00000000-0000-0000-0000-000000000000-0_20250903135712613.log.1_1-3-0
   nyc\.00000000-0000-0000-0000-000000000000-0_20250903135712613.log.1_0-3-0
   la\.00000000-0000-0000-0000-000000000000-0_20250903135712613.log.1_1-3-0
   
   It looks like although the parallelism and write tasks etc were set to 3, 2 
records went to the single subtask and one subtask got zero records.
   
   I tried with 5 values as well:
   
   INSERT INTO t_mor VALUES
   (1,'id-0001', 'r11', 'sf'),
   (2,'id-0002', 'r21', 'nyc'),
   (3,'id-0003', 'r31', 'la'),
   (4,'id-0004', 'r41', 'hu'),
   (5,'id-0005', 'r51', 'wd');
   
   SNAPSHOT_COUNT=5
   INCREMENTAL_COUNT=3
   
+-------------------+----------------------+------------------+----------------------+--------------------------------------+---+-------+-----+----+
   |_hoodie_commit_time|_hoodie_commit_seqno  
|_hoodie_record_key|_hoodie_partition_path|_hoodie_file_name                    
 |ts |uuid   |rider|city|
   
+-------------------+----------------------+------------------+----------------------+--------------------------------------+---+-------+-----+----+
   |20250903141316770  |20250903141316770_1_11|id-0001           |sf            
        |00000000-0000-0000-0000-000000000000-0|1  |id-0001|r11  |sf  |
   |20250903141316770  |20250903141316770_0_12|id-0002           |nyc           
        |00000000-0000-0000-0000-000000000000-0|2  |id-0002|r21  |nyc |
   |20250903141316770  |20250903141316770_1_14|id-0003           |la            
        |00000000-0000-0000-0000-000000000000-0|3  |id-0003|r31  |la  |
   |20250903141316770  |20250903141316770_2_10|id-0004           |hu            
        |00000000-0000-0000-0000-000000000000-0|4  |id-0004|r41  |hu  |
   |20250903141316770  |20250903141316770_0_13|id-0005           |wd            
        |00000000-0000-0000-0000-000000000000-0|5  |id-0005|r51  |wd  |
   
+-------------------+----------------------+------------------+----------------------+--------------------------------------+---+-------+-----+----+
   
   
+-------------------+----------------------+------------------+----------------------+--------------------------------------+---+-------+-----+----+
   |_hoodie_commit_time|_hoodie_commit_seqno  
|_hoodie_record_key|_hoodie_partition_path|_hoodie_file_name                    
 |ts |uuid   |rider|city|
   
+-------------------+----------------------+------------------+----------------------+--------------------------------------+---+-------+-----+----+
   |20250903141316770  |20250903141316770_1_14|id-0003           |la            
        |00000000-0000-0000-0000-000000000000-0|3  |id-0003|r31  |la  |
   |20250903141316770  |20250903141316770_2_10|id-0004           |hu            
        |00000000-0000-0000-0000-000000000000-0|4  |id-0004|r41  |hu  |
   |20250903141316770  |20250903141316770_0_13|id-0005           |wd            
        |00000000-0000-0000-0000-000000000000-0|5  |id-0005|r51  |wd  |
   
+-------------------+----------------------+------------------+----------------------+--------------------------------------+---+-------+-----+----+
   
   hu\.00000000-0000-0000-0000-000000000000-0_20250903141316770.log.1_2-3-0
   la\.00000000-0000-0000-0000-000000000000-0_20250903141316770.log.1_1-3-0
   nyc\.00000000-0000-0000-0000-000000000000-0_20250903141316770.log.1_0-3-0
   sf\.00000000-0000-0000-0000-000000000000-0_20250903141316770.log.1_1-3-0
   wd\.00000000-0000-0000-0000-000000000000-0_20250903141316770.log.1_0-3-0
   ```
   
   It does seem like it's related to the suffix, as only the unique records 
show up. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to