xylaaaaa opened a new pull request, #64807: URL: https://github.com/apache/doris/pull/64807
## Proposed changes Fix ORC timestamp decoding to round nanoseconds to Doris microseconds instead of truncating them. This keeps `CAST(timestamp AS VARCHAR)` aligned with Hive/Trino prefix expectations for values like `2020-01-02 03:04:05.321`. The same decode path is used by nested timestamps in array/map/struct columns, so this also covers complex type projections. ## Problem summary ORC stores timestamp fractional seconds as nanoseconds, while Doris `DATETIMEV2(6)` keeps microseconds. The previous conversion truncated nanos with `/ 1000`, so an ORC value such as `320999999ns` became `.320999` instead of `.321000`. Prefix predicates like: ```sql CAST(ts AS VARCHAR) LIKE '2020-01-02 03:04:05.321%' ``` could therefore miss rows created by Hive/Trino ORC writers. ## Solution - Round ORC nanoseconds to microseconds during timestamp decode. - Carry `999999500ns` and above into the next second. - Apply the same helper to `TIMESTAMP` and `TIMESTAMP_INSTANT` decode paths. - Add a BE unit test covering rounding and second carry. ## Test plan - `ninja -j 8 doris_be_test` - `./be/ut_build_RELEASE/test/doris_be_test --gtest_filter='OrcReaderFillDataTest.TestTimestampNanosecondsRoundToMicroseconds'` - `./be/ut_build_RELEASE/test/doris_be_test --gtest_filter='OrcReaderFillDataTest.*'` - `git diff --check` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
