[ 
https://issues.apache.org/jira/browse/SPARK-57550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-57550:
-----------------------------
    Description: 
Umbrella for follow-up work extending support for the TIME data type (TimeType) 
introduced by the SPIP SPARK-51162. Collects TIME-related tasks such as casts 
to/from other types, columnar/Arrow support, Parquet/Variant interop, and 
statistics collection.

h2. Prioritization

Suggested ordering of the open sub-tasks by dependency, correctness impact, 
ANSI/user value, and existing momentum (open PRs).

h3. Dependencies / blockers

* SPARK-57551 (TIME precision -> 9) is the only hard in-umbrella blocker: it 
gates SPARK-57552 and SPARK-57554.
* SPARK-57552 and SPARK-57554 additionally depend on the nanosecond TIMESTAMP 
types (TimestampNTZNanosType / TimestampLTZNanos), tracked outside this 
umbrella.

h3. Tier 1 - do first (correctness gaps; small, no dependencies)

TIME currently throws/fails in these paths, so they behave like bugs:
* SPARK-54203 - RowToColumnConverter: TIME hits unsupportedDataTypeError in 
row->column conversion (caching/vectorized paths). Best single first ticket.
* SPARK-54582 - stats serialization: CatalogColumnStat.toExternalString throws 
for TIME, so ANALYZE TABLE min/max persistence is broken.
* SPARK-57559 - add a TimeType case to PhysicalDataType: trivial robustness fix.

h3. Tier 2 - high ANSI/user value, no deps, momentum (PRs exist)

* SPARK-52617 - TIME <-> TIMESTAMP_NTZ (micros): ANSI-mandatory cast, highest 
everyday value (PR open).
* SPARK-54281 - numeric -> TIME: completes cast symmetry (PR open).
* SPARK-57553 - TIME <-> TIMESTAMP_LTZ (micros): finishes the ANSI cast matrix 
for the common timestamp type.
* SPARK-52621 - TIME <-> VARIANT (PR open); needs the encoding decision first.

h3. Tier 3 - foundational enabler for the nanosecond line

* SPARK-57551 - precision -> 9: highest-leverage enabler; unblocks SPARK-57552 
/ SPARK-57554 and aligns TIME with the in-flight nanos TIMESTAMP work and 
ANSI's "TIME and TIMESTAMP share the same max precision" rule. Start early if 
the nanosecond direction is a release priority.
* Then SPARK-57552 and SPARK-57554 once 57551 and the nanos TIMESTAMP types are 
in.

h3. Tier 4 - valuable but independent / can run anytime

* SPARK-57555 - JDBC data source: biggest migration payoff (the SPIP 
motivation), but a larger multi-dialect effort; parallelize on its own track.
* SPARK-54507 - time_bucket (PR open), SPARK-57558 - LOCALTIME (small, ANSI), 
SPARK-57557 - quantile/sketch aggregates.

h3. Tier 5 - lower priority / niche / polish

* SPARK-53368 - Parquet isAdjustedToUTC=true (PR open, minor), SPARK-57560 - 
TRY-mode arithmetic, SPARK-57556 - Hive interop (Hive has no TIME; mostly a 
documented-limitation task), SPARK-51403 - ordered/atomic tests (starter), and 
docs SPARK-57030 / SPARK-57031 (do last, once behavior is settled).

h3. Bottom line

* Implement first: SPARK-54203 (smallest, no deps, closes a real failure path).
* In parallel, kick off: SPARK-57551 (foundational blocker for the nanosecond 
cast branch).
* Then drive to done: the ANSI cast tickets with existing PRs (SPARK-52617, 
SPARK-54281).

  was:Umbrella for follow-up work extending support for the TIME data type 
(TimeType) introduced by the SPIP SPARK-51162. Collects TIME-related tasks such 
as casts to/from other types, columnar/Arrow support, Parquet/Variant interop, 
and statistics collection.


> Extend support for the TIME data type
> -------------------------------------
>
>                 Key: SPARK-57550
>                 URL: https://issues.apache.org/jira/browse/SPARK-57550
>             Project: Spark
>          Issue Type: Umbrella
>          Components: SQL
>    Affects Versions: 4.3.0
>            Reporter: Max Gekk
>            Assignee: Max Gekk
>            Priority: Major
>
> Umbrella for follow-up work extending support for the TIME data type 
> (TimeType) introduced by the SPIP SPARK-51162. Collects TIME-related tasks 
> such as casts to/from other types, columnar/Arrow support, Parquet/Variant 
> interop, and statistics collection.
> h2. Prioritization
> Suggested ordering of the open sub-tasks by dependency, correctness impact, 
> ANSI/user value, and existing momentum (open PRs).
> h3. Dependencies / blockers
> * SPARK-57551 (TIME precision -> 9) is the only hard in-umbrella blocker: it 
> gates SPARK-57552 and SPARK-57554.
> * SPARK-57552 and SPARK-57554 additionally depend on the nanosecond TIMESTAMP 
> types (TimestampNTZNanosType / TimestampLTZNanos), tracked outside this 
> umbrella.
> h3. Tier 1 - do first (correctness gaps; small, no dependencies)
> TIME currently throws/fails in these paths, so they behave like bugs:
> * SPARK-54203 - RowToColumnConverter: TIME hits unsupportedDataTypeError in 
> row->column conversion (caching/vectorized paths). Best single first ticket.
> * SPARK-54582 - stats serialization: CatalogColumnStat.toExternalString 
> throws for TIME, so ANALYZE TABLE min/max persistence is broken.
> * SPARK-57559 - add a TimeType case to PhysicalDataType: trivial robustness 
> fix.
> h3. Tier 2 - high ANSI/user value, no deps, momentum (PRs exist)
> * SPARK-52617 - TIME <-> TIMESTAMP_NTZ (micros): ANSI-mandatory cast, highest 
> everyday value (PR open).
> * SPARK-54281 - numeric -> TIME: completes cast symmetry (PR open).
> * SPARK-57553 - TIME <-> TIMESTAMP_LTZ (micros): finishes the ANSI cast 
> matrix for the common timestamp type.
> * SPARK-52621 - TIME <-> VARIANT (PR open); needs the encoding decision first.
> h3. Tier 3 - foundational enabler for the nanosecond line
> * SPARK-57551 - precision -> 9: highest-leverage enabler; unblocks 
> SPARK-57552 / SPARK-57554 and aligns TIME with the in-flight nanos TIMESTAMP 
> work and ANSI's "TIME and TIMESTAMP share the same max precision" rule. Start 
> early if the nanosecond direction is a release priority.
> * Then SPARK-57552 and SPARK-57554 once 57551 and the nanos TIMESTAMP types 
> are in.
> h3. Tier 4 - valuable but independent / can run anytime
> * SPARK-57555 - JDBC data source: biggest migration payoff (the SPIP 
> motivation), but a larger multi-dialect effort; parallelize on its own track.
> * SPARK-54507 - time_bucket (PR open), SPARK-57558 - LOCALTIME (small, ANSI), 
> SPARK-57557 - quantile/sketch aggregates.
> h3. Tier 5 - lower priority / niche / polish
> * SPARK-53368 - Parquet isAdjustedToUTC=true (PR open, minor), SPARK-57560 - 
> TRY-mode arithmetic, SPARK-57556 - Hive interop (Hive has no TIME; mostly a 
> documented-limitation task), SPARK-51403 - ordered/atomic tests (starter), 
> and docs SPARK-57030 / SPARK-57031 (do last, once behavior is settled).
> h3. Bottom line
> * Implement first: SPARK-54203 (smallest, no deps, closes a real failure 
> path).
> * In parallel, kick off: SPARK-57551 (foundational blocker for the nanosecond 
> cast branch).
> * Then drive to done: the ANSI cast tickets with existing PRs (SPARK-52617, 
> SPARK-54281).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to