[I] Two records inserted by two insert into operations go to the same parquet file [hudi]

via GitHub Thu, 11 Dec 2025 04:30:18 -0800


bithw1 opened a new issue, #17570:
URL: https://github.com/apache/hudi/issues/17570


   ### Describe the problem you faced
   
   I am using Hudi 0.15.0 and Spark SQL, and do the following operations,
   
   I insert two records into the table hudi_cow_20251211 one by one, and when I 
query the table, I find that these two records go to the same parquet file, 
which looks incorrect to me, they should belong two parquet files.
   
   
   
   ```
   set hoodie.datasource.write.operation=insert;
   set hoodie.spark.sql.insert.into.operation=insert;
   
   CREATE TABLE IF NOT EXISTS hudi_cow_20251211 (
     a INT,
     b INT,
     c INT
   ) 
   
   USING hudi
   
   tblproperties(
   type='cow',
   primaryKey='a',
   preCombineField='c'
   )
   
   insert into hudi_cow_20251211 select 1,1,1;
   
   insert into hudi_cow_20251211 select 1,11,111;
   
   ```
   
   Then, I query the table  `select * from hudi_cow_20251211;`, the result is 
   
   ```
   20251211201308842       20251211201308842_0_0   1               
966b7c8e-66e6-40b7-badf-fe4234bb9f23-0_0-270-227_20251211201314945.parquet      
1       1       1
   20251211201314945       20251211201314945_0_1   1               
966b7c8e-66e6-40b7-badf-fe4234bb9f23-0_0-270-227_20251211201314945.parquet      
1       11      111
   
   ```
   
   I wonder why the two records (two commits) go to the same parquet 
file(966b7c8e-66e6-40b7-badf-fe4234bb9f23-0_0-270-227_20251211201314945.parquet 
), i think they should be created in two different parquet files(each respect 
one file slice), I want to ask whether I have missed something.
   
   
   
   
   
   ### To Reproduce
   
   1.
   2.
   3.
   4.
   
   
   ### Expected behavior
   
   1
   
   ### Environment Description
   
   * Hudi version:
   * Spark version:
   * Flink version:
   * Hive version:
   * Hadoop version:
   * Storage (HDFS/S3/GCS..):
   * Running on Docker? (yes/no):
   
   
   ### Additional context
   
   1
   
   ### Stacktrace
   
   ```shell
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] Two records inserted by two insert into operations go to the same parquet file [hudi]

Reply via email to