MaitreyaManohar opened a new issue, #12978: URL: https://github.com/apache/hudi/issues/12978
**_Tips before filing an issue_** - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)? y - Join the mailing list to engage in conversations and get faster support at [email protected]. - If you have triaged this as a bug, then file an [issue](https://issues.apache.org/jira/projects/HUDI/issues) directly. **Describe the problem you faced** `_hoodie_commit_timestamp` is NULL for some rows while performing a streaming query using Flink SQL. I have also tried this using the Flink Table API for java, and the issue seems to be prevalent there as well. I am demonstrating the issue in Flink SQL since it is easier to reproduce. **To Reproduce** Steps to reproduce the behavior: 1. Run these SQL commands in the flink SQL bash script of the jobmanager in one shell. (lets call this shell 1) ``` CREATE TABLE hudi_table ( _hoodie_commit_time STRING, uuid STRING PRIMARY KEY NOT ENFORCED, -- Record key test BOOLEAN -- Check column ) WITH ( 'connector' = 'hudi', 'path' = 's3a://test-bucket/test', -- Path to your Hudi table 'table.type' = 'COPY_ON_WRITE', -- Specify the table type (COW or MOR) 'cdc.enabled' = 'true', -- Enable CDC 'hoodie.datasource.write.recordkey.field' = 'uuid' -- Record key field ); ``` 2. In the second SQL client shell Run these commands (Lets call this shell 2) ``` CREATE TABLE hudi_table ( _hoodie_commit_time STRING, uuid STRING PRIMARY KEY NOT ENFORCED, -- Record key test BOOLEAN -- Check column ) WITH ( 'connector' = 'hudi', 'path' = 's3a://test-bucket/test', -- Path to your Hudi table 'table.type' = 'COPY_ON_WRITE', -- Specify the table type (COW or MOR) 'cdc.enabled' = 'true', -- Enable CDC 'hoodie.datasource.write.recordkey.field' = 'uuid' -- Record key field ); select * from hudi_table/*+ OPTIONS('read.streaming.enabled'='true')*/; ``` The output currently after running this command is 3. In shell 1 run these commands (give some time before running the second insert command) ``` INSERT INTO hudi_table (uuid, test) VALUES ('1', true); -- Wait for the row to be reflected in shell 1 output (This one has the _hoodie_commit_timestamp) -- When it appears run the below INSERT command INSERT INTO hudi_table (uuid, test) VALUES ('4', true); ``` 4. Output in shell 2 is shown below ``` SQL Query Result (Table) Refresh: 1 s Page: Last of 1 Updated: 08:08:50.900 _hoodie_commit_time uuid test 20250314080450519 1 TRUE <NULL> 4 TRUE Q Quit + Inc Refresh G Goto Page N Next Page O Open Row R Refresh - Dec Refresh L Last Page P Prev Page ``` **Expected behavior** `_hoodie_commit_timestamp` should not be null for all rows regardless of which query I use or what read.start-commit value I use. Right now, it appears to be NULL for some rows. **Environment Description** * Hudi version : 0.15.0 * Spark version : * Hive version : * Hadoop version : 3.3.4 * Storage (HDFS/S3/GCS..) : Minio (S3) * Running on Docker? (yes/no) : yes **Additional context** I am using docker for flink, and have added the required jars to the lib folder for hudi. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
