mbutrovich commented on PR #3146:
URL: 
https://github.com/apache/datafusion-comet/pull/3146#issuecomment-4245074985

   Thanks @andygrove! Claude summarized my notes for me. Hopefully it didn't 
transcribe anything wrong or hallucinate :)
   
   > ## PR #3146: `timestamp_seconds`
   > 
   > ### Overflow behavior mismatch with Spark
   > 
   > This is the main concern. Spark's `SecondsToTimestamp` for integer inputs 
uses `Math.multiplyExact`, which throws `ArithmeticException("long overflow")` 
on overflow. The Rust implementation uses `checked_mul` and maps overflow to 
`None` (null). So Comet silently returns null where Spark would throw.
   > 
   > Either return an error on overflow to match Spark, or document the 
deviation and mark it `Incompatible` in the serde's `getSupportLevel`.
   > 
   > ### NULL buffer handling
   > 
   > All three input type paths build the output via `.iter().map().collect()`, 
which constructs the null buffer row-by-row through Arrow's builder. For Int32 
and Int64, no new nulls are introduced. The output nulls are identical to the 
input nulls. The null buffer could be cloned directly and values computed with 
Arrow's `unary` kernel or direct buffer arithmetic. This avoids per-row null 
tracking overhead. Float64 is trickier since NaN/Infinity introduce new nulls, 
so iterative construction is more justifiable there.
   > 
   > ### Missing input types
   > 
   > Spark accepts `NumericType` (Byte, Short, Int, Long, Float, Double, 
Decimal). The PR only handles Int32, Int64, Float64. Float32 would be trivial 
to add (cast to f64 and use the Float64 path). Decimal is more complex and 
could be deferred.
   > 
   > ### Test gaps
   > 
   > A few cases worth adding:
   > - NaN input: `SELECT timestamp_seconds(CAST('NaN' AS DOUBLE))` (should 
return null)
   > - Infinity input: `SELECT timestamp_seconds(CAST('Infinity' AS DOUBLE))` 
(should return null)
   > - Overflow: `SELECT timestamp_seconds(9223372036854775807)` (Spark throws, 
what does Comet do?)
   > - Int32 column reference (currently only `bigint` column in the test table)
   > - Float64 column reference (currently only literal casts)
   > 
   > ### Docs
   > 
   > `docs/spark_expressions_support.md` has `timestamp_seconds` marked `[ ]`. 
Should be updated to `[x]`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to