xylaaaaa opened a new pull request, #64807:
URL: https://github.com/apache/doris/pull/64807

   ## Proposed changes
   
   Fix ORC timestamp decoding to round nanoseconds to Doris microseconds 
instead of truncating them. This keeps `CAST(timestamp AS VARCHAR)` aligned 
with Hive/Trino prefix expectations for values like `2020-01-02 03:04:05.321`.
   
   The same decode path is used by nested timestamps in array/map/struct 
columns, so this also covers complex type projections.
   
   ## Problem summary
   
   ORC stores timestamp fractional seconds as nanoseconds, while Doris 
`DATETIMEV2(6)` keeps microseconds. The previous conversion truncated nanos 
with `/ 1000`, so an ORC value such as `320999999ns` became `.320999` instead 
of `.321000`. Prefix predicates like:
   
   ```sql
   CAST(ts AS VARCHAR) LIKE '2020-01-02 03:04:05.321%'
   ```
   
   could therefore miss rows created by Hive/Trino ORC writers.
   
   ## Solution
   
   - Round ORC nanoseconds to microseconds during timestamp decode.
   - Carry `999999500ns` and above into the next second.
   - Apply the same helper to `TIMESTAMP` and `TIMESTAMP_INSTANT` decode paths.
   - Add a BE unit test covering rounding and second carry.
   
   ## Test plan
   
   - `ninja -j 8 doris_be_test`
   - `./be/ut_build_RELEASE/test/doris_be_test 
--gtest_filter='OrcReaderFillDataTest.TestTimestampNanosecondsRoundToMicroseconds'`
   - `./be/ut_build_RELEASE/test/doris_be_test 
--gtest_filter='OrcReaderFillDataTest.*'`
   - `git diff --check`
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to