mbutrovich commented on PR #3146:
URL:
https://github.com/apache/datafusion-comet/pull/3146#issuecomment-4245074985
Thanks @andygrove! Claude summarized my notes for me. Hopefully it didn't
transcribe anything wrong or hallucinate :)
> ## PR #3146: `timestamp_seconds`
>
> ### Overflow behavior mismatch with Spark
>
> This is the main concern. Spark's `SecondsToTimestamp` for integer inputs
uses `Math.multiplyExact`, which throws `ArithmeticException("long overflow")`
on overflow. The Rust implementation uses `checked_mul` and maps overflow to
`None` (null). So Comet silently returns null where Spark would throw.
>
> Either return an error on overflow to match Spark, or document the
deviation and mark it `Incompatible` in the serde's `getSupportLevel`.
>
> ### NULL buffer handling
>
> All three input type paths build the output via `.iter().map().collect()`,
which constructs the null buffer row-by-row through Arrow's builder. For Int32
and Int64, no new nulls are introduced. The output nulls are identical to the
input nulls. The null buffer could be cloned directly and values computed with
Arrow's `unary` kernel or direct buffer arithmetic. This avoids per-row null
tracking overhead. Float64 is trickier since NaN/Infinity introduce new nulls,
so iterative construction is more justifiable there.
>
> ### Missing input types
>
> Spark accepts `NumericType` (Byte, Short, Int, Long, Float, Double,
Decimal). The PR only handles Int32, Int64, Float64. Float32 would be trivial
to add (cast to f64 and use the Float64 path). Decimal is more complex and
could be deferred.
>
> ### Test gaps
>
> A few cases worth adding:
> - NaN input: `SELECT timestamp_seconds(CAST('NaN' AS DOUBLE))` (should
return null)
> - Infinity input: `SELECT timestamp_seconds(CAST('Infinity' AS DOUBLE))`
(should return null)
> - Overflow: `SELECT timestamp_seconds(9223372036854775807)` (Spark throws,
what does Comet do?)
> - Int32 column reference (currently only `bigint` column in the test table)
> - Float64 column reference (currently only literal casts)
>
> ### Docs
>
> `docs/spark_expressions_support.md` has `timestamp_seconds` marked `[ ]`.
Should be updated to `[x]`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]