yaojiejia opened a new pull request, #18426:
URL: https://github.com/apache/hudi/pull/18426
### Describe the issue this Pull Request addresses
Closes #17152
Incremental query timestamp options
(`hoodie.datasource.read.begin.instanttime` /
`hoodie.datasource.read.end.instanttime`) were previously accepted as raw
strings with no validation/normalization, allowing invalid inputs like `42` to
pass through. Time travel (`as.of.instant`) also lacked support for epoch
seconds/millis and ISO `T`-separated timestamps.
### Summary and Changelog
This PR adds **early validation** and **normalization** for incremental
start/end instants and expands supported timestamp input formats to include ISO
`T` timestamps and epoch seconds/millis.
Changelog:
- Extend `HoodieSqlCommonUtils.formatQueryInstant` to support:
- ISO datetime with `T` separator (e.g. `2025-01-02T03:04:56.789`)
- Epoch seconds (10-digit numeric strings)
- Epoch millis (13-digit numeric strings)
- Improved error message listing supported formats
- Add `HoodieSqlCommonUtils.formatIncrementalInstant` to validate/normalize
incremental instants while allowing sentinel values:
- `"earliest"`, `"000"`, and bootstrap instants (`00000000000000`,
`00000000000001`, `00000000000002`)
- Normalize `START_COMMIT` / `END_COMMIT` in
`DataSourceOptionsHelper.parametersWithReadDefaults` so invalid incremental
instants fail fast before relations/file-indexes are constructed.
- Apply the same validation/normalization in
`HoodieTableChangesOptionsParser` (`hudi_table_changes`) for start/end instants.
- Add unit tests covering accepted/rejected formats and option normalization
(`TestInstantTimeValidation`).
### Impact
User-facing behavior change:
- Incremental reads now **fail fast** for unsupported/invalid timestamp
formats in `hoodie.datasource.read.begin.instanttime` /
`hoodie.datasource.read.end.instanttime` (e.g., `"42"`), instead of silently
accepting them.
- Incremental and time-travel timestamp parsing now supports additional
common formats:
- ISO timestamps with `T` separator
- Epoch seconds and epoch millis
### Risk Level
low
This change is limited to option parsing/normalization and only affects
cases where users provide invalid or newly-supported timestamp formats.
Existing valid Hudi instants (`yyyyMMddHHmmss[SSS]`) and existing sentinel
values continue to work.
Verification:
- `mvn compile -pl hudi-spark-datasource/hudi-spark-common -am -DskipTests`
- `mvn test-compile -pl hudi-spark-datasource/hudi-spark-common -am
-DskipTests`
### Documentation Update
none
### Contributor's checklist
- [x] Read through [contributor's
guide](https://hudi.apache.org/contribute/how-to-contribute)
- [x] Enough context is provided in the sections above
- [x] Adequate tests were added if applicable
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]