Hans-Raintree commented on issue #13827:
URL: https://github.com/apache/hudi/issues/13827#issuecomment-3249465200
Hey @cshuo,
I tried with parallelism 3:
```
SNAPSHOT_COUNT=3
INCREMENTAL_COUNT=2
+-------------------+---------------------+------------------+----------------------+--------------------------------------+---+-------+-----+----+
|_hoodie_commit_time|_hoodie_commit_seqno
|_hoodie_record_key|_hoodie_partition_path|_hoodie_file_name
|ts |uuid |rider|city|
+-------------------+---------------------+------------------+----------------------+--------------------------------------+---+-------+-----+----+
|20250903135712613 |20250903135712613_1_8|id-0001 |sf
|00000000-0000-0000-0000-000000000000-0|1 |id-0001|r11 |sf |
|20250903135712613 |20250903135712613_0_7|id-0002 |nyc
|00000000-0000-0000-0000-000000000000-0|2 |id-0002|r21 |nyc |
|20250903135712613 |20250903135712613_1_9|id-0003 |la
|00000000-0000-0000-0000-000000000000-0|3 |id-0003|r31 |la |
+-------------------+---------------------+------------------+----------------------+--------------------------------------+---+-------+-----+----+
+-------------------+---------------------+------------------+----------------------+--------------------------------------+---+-------+-----+----+
|_hoodie_commit_time|_hoodie_commit_seqno
|_hoodie_record_key|_hoodie_partition_path|_hoodie_file_name
|ts |uuid |rider|city|
+-------------------+---------------------+------------------+----------------------+--------------------------------------+---+-------+-----+----+
|20250903135712613 |20250903135712613_1_8|id-0001 |sf
|00000000-0000-0000-0000-000000000000-0|1 |id-0001|r11 |sf |
|20250903135712613 |20250903135712613_0_7|id-0002 |nyc
|00000000-0000-0000-0000-000000000000-0|2 |id-0002|r21 |nyc |
+-------------------+---------------------+------------------+----------------------+--------------------------------------+---+-------+-----+----+
sf\.00000000-0000-0000-0000-000000000000-0_20250903135712613.log.1_1-3-0
nyc\.00000000-0000-0000-0000-000000000000-0_20250903135712613.log.1_0-3-0
la\.00000000-0000-0000-0000-000000000000-0_20250903135712613.log.1_1-3-0
It looks like although the parallelism and write tasks etc were set to 3, 2
records went to the single subtask and one subtask got zero records.
I tried with 5 values as well:
INSERT INTO t_mor VALUES
(1,'id-0001', 'r11', 'sf'),
(2,'id-0002', 'r21', 'nyc'),
(3,'id-0003', 'r31', 'la'),
(4,'id-0004', 'r41', 'hu'),
(5,'id-0005', 'r51', 'wd');
SNAPSHOT_COUNT=5
INCREMENTAL_COUNT=3
+-------------------+----------------------+------------------+----------------------+--------------------------------------+---+-------+-----+----+
|_hoodie_commit_time|_hoodie_commit_seqno
|_hoodie_record_key|_hoodie_partition_path|_hoodie_file_name
|ts |uuid |rider|city|
+-------------------+----------------------+------------------+----------------------+--------------------------------------+---+-------+-----+----+
|20250903141316770 |20250903141316770_1_11|id-0001 |sf
|00000000-0000-0000-0000-000000000000-0|1 |id-0001|r11 |sf |
|20250903141316770 |20250903141316770_0_12|id-0002 |nyc
|00000000-0000-0000-0000-000000000000-0|2 |id-0002|r21 |nyc |
|20250903141316770 |20250903141316770_1_14|id-0003 |la
|00000000-0000-0000-0000-000000000000-0|3 |id-0003|r31 |la |
|20250903141316770 |20250903141316770_2_10|id-0004 |hu
|00000000-0000-0000-0000-000000000000-0|4 |id-0004|r41 |hu |
|20250903141316770 |20250903141316770_0_13|id-0005 |wd
|00000000-0000-0000-0000-000000000000-0|5 |id-0005|r51 |wd |
+-------------------+----------------------+------------------+----------------------+--------------------------------------+---+-------+-----+----+
+-------------------+----------------------+------------------+----------------------+--------------------------------------+---+-------+-----+----+
|_hoodie_commit_time|_hoodie_commit_seqno
|_hoodie_record_key|_hoodie_partition_path|_hoodie_file_name
|ts |uuid |rider|city|
+-------------------+----------------------+------------------+----------------------+--------------------------------------+---+-------+-----+----+
|20250903141316770 |20250903141316770_1_14|id-0003 |la
|00000000-0000-0000-0000-000000000000-0|3 |id-0003|r31 |la |
|20250903141316770 |20250903141316770_2_10|id-0004 |hu
|00000000-0000-0000-0000-000000000000-0|4 |id-0004|r41 |hu |
|20250903141316770 |20250903141316770_0_13|id-0005 |wd
|00000000-0000-0000-0000-000000000000-0|5 |id-0005|r51 |wd |
+-------------------+----------------------+------------------+----------------------+--------------------------------------+---+-------+-----+----+
hu\.00000000-0000-0000-0000-000000000000-0_20250903141316770.log.1_2-3-0
la\.00000000-0000-0000-0000-000000000000-0_20250903141316770.log.1_1-3-0
nyc\.00000000-0000-0000-0000-000000000000-0_20250903141316770.log.1_0-3-0
sf\.00000000-0000-0000-0000-000000000000-0_20250903141316770.log.1_1-3-0
wd\.00000000-0000-0000-0000-000000000000-0_20250903141316770.log.1_0-3-0
```
It does seem like it's related to the suffix, as only the unique records
show up.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]