[PR] fix: [df52] timestamp nanos precision loss with nanosAsLong [datafusion-comet]

via GitHub Thu, 12 Feb 2026 09:09:24 -0800


andygrove opened a new pull request, #3502:
URL: https://github.com/apache/datafusion-comet/pull/3502


   ## Summary
   
   - Fix SPARK-40819 test failure where `TIMESTAMP(NANOS)` values lose 
precision when read with `LEGACY_PARQUET_NANOS_AS_LONG=true`
   - The `SparkPhysicalExprAdapter` was routing `Timestamp(Nanosecond) → Int64` 
casts through Spark's `Cast` expression, which divides by `MICROS_PER_SECOND` 
(10^6) assuming microsecond precision. With nanosecond values, this truncates 
to milliseconds (e.g., `1668537129123534758` becomes `1668537129123`)
   - Fix routes `Timestamp → Int64` through `CometCastColumnExpr` instead, 
which uses Arrow's cast to correctly reinterpret the raw i64 value without 
conversion
   
   ## Test plan
   
   - [x] `SPARK-40819: parquet file with TIMESTAMP(NANOS, true) (with 
nanosAsLong=true)` — now passes
   - [x] `SPARK-40819: parquet file with TIMESTAMP(NANOS, true) (with default 
nanosAsLong=false)` — still passes (expects error)
   - [ ] Verify no regressions in CI `ParquetSchemaSuite` tests
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] fix: [df52] timestamp nanos precision loss with nanosAsLong [datafusion-comet]

Reply via email to