yaojiejia opened a new pull request, #18426:
URL: https://github.com/apache/hudi/pull/18426

   ### Describe the issue this Pull Request addresses
   
   Closes #17152
   
   Incremental query timestamp options 
(`hoodie.datasource.read.begin.instanttime` / 
`hoodie.datasource.read.end.instanttime`) were previously accepted as raw 
strings with no validation/normalization, allowing invalid inputs like `42` to 
pass through. Time travel (`as.of.instant`) also lacked support for epoch 
seconds/millis and ISO `T`-separated timestamps.
   
   ### Summary and Changelog
   
   This PR adds **early validation** and **normalization** for incremental 
start/end instants and expands supported timestamp input formats to include ISO 
`T` timestamps and epoch seconds/millis.
   
   Changelog:
   - Extend `HoodieSqlCommonUtils.formatQueryInstant` to support:
     - ISO datetime with `T` separator (e.g. `2025-01-02T03:04:56.789`)
     - Epoch seconds (10-digit numeric strings)
     - Epoch millis (13-digit numeric strings)
     - Improved error message listing supported formats
   - Add `HoodieSqlCommonUtils.formatIncrementalInstant` to validate/normalize 
incremental instants while allowing sentinel values:
     - `"earliest"`, `"000"`, and bootstrap instants (`00000000000000`, 
`00000000000001`, `00000000000002`)
   - Normalize `START_COMMIT` / `END_COMMIT` in 
`DataSourceOptionsHelper.parametersWithReadDefaults` so invalid incremental 
instants fail fast before relations/file-indexes are constructed.
   - Apply the same validation/normalization in 
`HoodieTableChangesOptionsParser` (`hudi_table_changes`) for start/end instants.
   - Add unit tests covering accepted/rejected formats and option normalization 
(`TestInstantTimeValidation`).
   
   
   ### Impact
   
   User-facing behavior change:
   - Incremental reads now **fail fast** for unsupported/invalid timestamp 
formats in `hoodie.datasource.read.begin.instanttime` / 
`hoodie.datasource.read.end.instanttime` (e.g., `"42"`), instead of silently 
accepting them.
   - Incremental and time-travel timestamp parsing now supports additional 
common formats:
     - ISO timestamps with `T` separator
     - Epoch seconds and epoch millis
   
   ### Risk Level
   
   low
   
   This change is limited to option parsing/normalization and only affects 
cases where users provide invalid or newly-supported timestamp formats. 
Existing valid Hudi instants (`yyyyMMddHHmmss[SSS]`) and existing sentinel 
values continue to work.
   
   Verification:
   - `mvn compile -pl hudi-spark-datasource/hudi-spark-common -am -DskipTests`
   - `mvn test-compile -pl hudi-spark-datasource/hudi-spark-common -am 
-DskipTests`
   
   ### Documentation Update
   
   none
   
   ### Contributor's checklist
   
   - [x] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [x] Enough context is provided in the sections above
   - [x] Adequate tests were added if applicable
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to