[jira] [Assigned] (SPARK-57661) Preserve TIME precision in the Spark <-> Arrow type mapping

Max Gekk (Jira) Thu, 25 Jun 2026 23:12:05 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-57661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Max Gekk reassigned SPARK-57661:
--------------------------------

    Assignee: Max Gekk

> Preserve TIME precision in the Spark <-> Arrow type mapping
> -----------------------------------------------------------
>
>                 Key: SPARK-57661
>                 URL: https://issues.apache.org/jira/browse/SPARK-57661
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 4.3.0
>            Reporter: Max Gekk
>            Assignee: Max Gekk
>            Priority: Major
>              Labels: pull-request-available
>
> h2. What
> Carry the {{TimeType(p)}} fractional-second precision {{p}} (in [0, 9]) 
> across the Spark <-> Arrow type mapping so that a {{TIME(p)}} column 
> round-trips back to the same {{TIME(p)}}, instead of collapsing to the 
> canonical {{TIME(6)}}.
> h2. Why
> {{ArrowUtils}} / the Types Framework currently map every {{TimeType(p)}} to 
> {{ArrowType.Time(TimeUnit.NANOSECOND, 64)}} (no precision field), and 
> {{TypeApiOps.fromArrowType}} maps {{ArrowType.Time(NANOSECOND, 64)}} back to 
> a fixed {{TimeType(TimeType.MICROS_PRECISION)}} (= 6). As a result the 
> declared precision is lost on any Arrow round-trip ({{TIME(0)}}, {{TIME(3)}}, 
> {{TIME(9)}}, ... all read back as {{TIME(6)}}), so Arrow-based schema 
> transfer (Connect schema/results, createDataFrame from Arrow, mapInArrow, 
> etc.) silently widens or narrows the type label. The stored value is already 
> nanosecond-resolution and is unaffected; this is purely a type-fidelity gap.
> Arrow's {{Time}} logical type only encodes (unit, bitWidth) and has no 
> fractional-precision field, so the precision cannot live in the {{ArrowType}} 
> itself. It can, however, be carried in the Arrow {{Field}} metadata, the same 
> channel Spark already uses to reconstruct parameterized logical types 
> (Geometry/Geography recover {{srid}}; the nanosecond timestamp types carry 
> their precision under {{SPARK::timestampNanos::precision}} per SPARK-57159).
> h2. Scope
> {{sql/api/.../util/ArrowUtils.scala}}: in {{toArrowField}}, tag 
> {{TimeType(p)}} fields with the precision metadata key 
> {{SPARK::time::precision}}, merged with the column metadata; in 
> {{fromArrowField}}, read that key to reconstruct {{TimeType(p)}}.
> {{sql/api/.../types/ops/TimeTypeApiOps.scala}} and 
> {{TypeApiOps.fromArrowType}}: keep {{toArrowType}} producing 
> {{Time(NANOSECOND, 64)}}; keep the metadata-less {{fromArrowType}} as the 
> canonical fallback.
> Reuse the precision-in-field-metadata pattern introduced for the nanosecond 
> timestamp types (SPARK-57159) for consistency.
> h2. Behavior on read-back
> Metadata present: reconstruct the exact {{TimeType(p)}}.
> Metadata absent (foreign Arrow data) or out of [0, 9]: fall back to the 
> current canonical {{TimeType(MICROS_PRECISION)}} (= 6), preserving today's 
> behavior for non-Spark producers.
> h2. Out of scope
> Value semantics / rounding: values are carried verbatim at nanosecond 
> resolution; no change to how {{TIME(p)}} values are truncated (that already 
> happens upstream).
> PySpark Arrow/pandas conversion and Spark Connect proto/converters (separate 
> sub-tasks), beyond what the shared {{ArrowUtils}} mapping provides.
> h2. How tested
> {{ArrowUtilsSuite}}: round-trip {{TIME(p)}} for {{p}} in {0, 3, 6, 9} 
> preserves {{p}}; a {{Time(NANOSECOND)}} field with no precision metadata 
> falls back to {{TIME(6)}}; the precision key does not leak into the 
> reconstructed column {{Metadata}}.
> h2. Does this introduce any user-facing change
> Yes (minor): a {{TIME(p)}} column transferred over Arrow now retains its 
> declared precision instead of always reading back as {{TIME(6)}}. No change 
> to stored values.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Assigned] (SPARK-57661) Preserve TIME precision in the Spark <-> Arrow type mapping

Reply via email to