MaitreyaManohar opened a new issue, #12978:
URL: https://github.com/apache/hudi/issues/12978

   **_Tips before filing an issue_**
   
   - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)? y
   
   - Join the mailing list to engage in conversations and get faster support at 
[email protected].
   
   - If you have triaged this as a bug, then file an 
[issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
   
   **Describe the problem you faced**
   
   `_hoodie_commit_timestamp` is NULL for some rows while performing a 
streaming query using Flink SQL. I have also tried this using the Flink Table 
API for java, and the issue seems to be prevalent there as well. I am 
demonstrating the issue in Flink SQL since it is easier to reproduce.
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. Run these SQL commands in the flink SQL bash script of the jobmanager in 
one shell. (lets call this shell 1)
   ```
   CREATE TABLE hudi_table (
     _hoodie_commit_time STRING,
     uuid STRING PRIMARY KEY NOT ENFORCED, -- Record key
     test BOOLEAN -- Check column
   ) 
   WITH (
     'connector' = 'hudi',
     'path' = 's3a://test-bucket/test', -- Path to your Hudi table
     'table.type' = 'COPY_ON_WRITE', -- Specify the table type (COW or MOR)
     'cdc.enabled' = 'true', -- Enable CDC
     'hoodie.datasource.write.recordkey.field' = 'uuid' -- Record key field
   );
   ```
   2. In the second SQL client shell Run these commands (Lets call this shell 2)
   ```
   CREATE TABLE hudi_table (
     _hoodie_commit_time STRING,
     uuid STRING PRIMARY KEY NOT ENFORCED, -- Record key
     test BOOLEAN -- Check column
   ) 
   WITH (
     'connector' = 'hudi',
     'path' = 's3a://test-bucket/test', -- Path to your Hudi table
     'table.type' = 'COPY_ON_WRITE', -- Specify the table type (COW or MOR)
     'cdc.enabled' = 'true', -- Enable CDC
     'hoodie.datasource.write.recordkey.field' = 'uuid' -- Record key field
   );
   select * from hudi_table/*+ OPTIONS('read.streaming.enabled'='true')*/;
   ```
   The output currently after running this command is 
   
   3. In shell 1 run these commands (give some time before running the second 
insert command)
   ```
   INSERT INTO hudi_table (uuid, test) VALUES ('1', true);
   -- Wait for the row to be reflected in shell 1 output (This one has the 
_hoodie_commit_timestamp)
   -- When it appears run the below INSERT command
   INSERT INTO hudi_table (uuid, test) VALUES ('4', true);
   ```
   4. Output in shell 2 is shown below
   ```
                                                                                
             SQL Query Result (Table)                                           
                                               
    Refresh: 1 s                                                                
                 Page: Last of 1                                                
                         Updated: 08:08:50.900 
   
               _hoodie_commit_time                           uuid   test
                 20250314080450519                              1   TRUE
                            <NULL>                              4   TRUE
   
   
   
   
   
   
   Q Quit                                  + Inc Refresh                        
   G Goto Page                             N Next Page                          
   O Open Row                              
   R Refresh                               - Dec Refresh                        
   L Last Page                             P Prev Page                          
   
   ```
   **Expected behavior**
   
   `_hoodie_commit_timestamp` should not be null for all rows regardless of 
which query I use or what read.start-commit value I use. Right now, it appears 
to be NULL for some rows.
   
   **Environment Description**
   
   * Hudi version : 0.15.0
   
   * Spark version :
   
   * Hive version :
   
   * Hadoop version : 3.3.4
   
   * Storage (HDFS/S3/GCS..) : Minio (S3)
   
   * Running on Docker? (yes/no) : yes
   
   
   **Additional context**
   
   I am using docker for flink, and have added the required jars to the lib 
folder for hudi.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to