[jira] [Updated] (SPARK-57551) Extend the TIME data type precision to nanoseconds (up to 9)

Max Gekk (Jira) Fri, 19 Jun 2026 06:16:28 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-57551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Max Gekk updated SPARK-57551:
-----------------------------
    Description: 
h2. What

Extend the fractional-seconds precision of the {{TIME}} data type from the 
current
maximum of 6 (microseconds) to 9 (nanoseconds). After this change {{TIME(p)}} 
accepts
{{0 <= p <= 9}}.

h2. Why

* Internal storage is *already* nanoseconds-since-midnight ({{Long}}), 
introduced by
  SPARK-52460. {{TimeType.NANOS_PRECISION = 9}} is already defined; only the cap
  {{TimeType.MAX_PRECISION = 6}} prevents using it.
* ANSI SQL (ISO/IEC 9075-2, 6.1 <data type>) makes the maximum {{<time 
precision>}}
  implementation-defined with the sole constraint that it is *not less than 6*, 
and
  Syntax Rule 36 requires the maximum of {{<time precision>}} and {{<timestamp 
precision>}}
  to be *the same* implementation-defined value.
* This worktree already supports nanosecond timestamps via 
{{TimestampNTZNanosType}} /
  {{TimestampLTZNanos}} (precision 7..9). To stay ANSI-consistent, {{TIME}} 
must reach
  precision 9 in lockstep.

h2. Scope

* Lift {{TimeType.MAX_PRECISION}} from 6 to 9 and update precision validation in
  {{TimeType}} and {{DataTypeAstBuilder}}.
* Update {{SparkDateTimeUtils.truncateTimeToPrecision}} (and its {{<= 
MAX_PRECISION}}
  assertion) to support p in 7..9.
* Time formatters/parsers ({{TimeFormatter}}, {{FractionTimeFormatter}}) must 
format and
  parse 7..9 fractional digits.
* Parquet I/O: the writer currently emits the {{TIME(MICROS)}} logical type; 
emit
  {{TIME(NANOS)}} for p in 7..9 and read it back ({{TimeTypeParquetOps}},
  {{ParquetSchemaConverter}}, {{ParquetWriteSupport}}, {{ParquetRowConverter}},
  vectorized reader).
* Verify casts already implemented for TIME (TIME(p1)->TIME(p2), TIME->DECIMAL,
  TIME->integral, STRING<->TIME) behave correctly for p in 7..9.

h2. Out of scope

* Casts to/from TIMESTAMP types (tracked separately).
* TIME WITH TIME ZONE (non-goal per SPARK-51162).

h2. Acceptance criteria

* {{TIME(7)}}, {{TIME(8)}}, {{TIME(9)}} can be declared, parsed, and used as 
literals.
* Round-trip through Parquet preserves nanosecond values.
* Existing TIME tests pass; new tests cover the 7..9 range.

h2. Test impact (max precision 6 -> 9)

Tests that hard-code precision 6 will need updating. The breaking ones (assert 
>6 is invalid)
should be fixed in this ticket; broader 7-9 coverage is tracked by SPARK-57563.

h3. MUST-UPDATE (assert precision > 6 is invalid; will fail/flip)

* DataTypeParserSuite.scala (test "unsupported precision of the time data 
type"): time(8)/time(9)
  currently expect UNSUPPORTED_TIME_PRECISION -> become valid; move the invalid 
case to time(10)
  and add valid 7/8/9 parse cases.
* DataTypeSuite.scala (test "Parse time(n) as TimeType(n)"): extend the {{0 to 
6}} loop to 0..9;
  {{DataType.fromJson("time(9)")}} expects INVALID_JSON_DATA_TYPE -> must 
parse; move invalid JSON
  to time(10). (The {{MAX_PRECISION + 1}} invalid-range check auto-adjusts.)
* TimeExpressionsSuite.scala (CurrentTime range check, ~lines 318-327): expected
  valueRange "[0, 6]" (from MICROS_PRECISION) -> "[0, 9]"; also switch the 
production current_time
  precision check from MICROS_PRECISION to MAX_PRECISION and add valid 
current_time(7/8/9).

h3. MUST-UPDATE (enumerate 0..6 as "all precisions"; won't error but miss the 
new range)

* TimeFunctionsSuiteBase.scala (current_time {{(0 to 6)}} loop).
* AvroSuite.scala / AvroFunctionsSuite.scala (precision 0-6 loops; time-micros 
logical type).
* OrcQuerySuite.scala (TIME(0)..TIME(6) casts + {{0 to 6}} assert loop).
* from_/to_ function suites with testData precisions 0-6: CsvFunctionsSuite, 
JsonFunctionsSuite,
  CsvExpressionsSuite, JsonExpressionsSuite, XmlExpressionsSuite, 
XmlFunctionsSuite.

h3. LIKELY-UPDATE (pass today; need 7-9 cases / nanosecond expectations)

* Generators capped at micros: LiteralGenerator.scala, 
RandomDataGenerator.scala,
  DateTimeTestUtils.localTime(..., micros).
* TimeFormatterSuite (HH:mm:ss.SSSSSS, 999999, "TIME(6)" error text) and 
DateTimeUtilsSuite cast
  error text.
* CastSuiteBase (TIME(p1)->TIME(p2), TIME->DECIMAL): loops auto-expand via 
MAX_PRECISION but input
  values are micros-only; add 7-9 fractional cases.
* Parquet/ORC/Avro micros assumptions: ParquetIOSuite, TimeTypeParquetOpsSuite 
(INT64 TIME(MICROS)
  -> needs TIME(NANOS) for 7-9), AvroSuite, PartitionedWriteSuite.
* Loops named via MICROS_PRECISION that mean "all valid precisions" -> switch 
to MAX_PRECISION:
  TimeExpressionsSuite, RowJsonSuite, DataTypeTestUtils (timeTypes), 
SparkConnectPlannerSuite.
* SQL golden: sql-tests/inputs/time.sql + results/time.sql.out nanosecond-input 
truncation cases
  (regenerate via SQLQueryTestSuite). Tracked in SPARK-57563.

h3. INFORMATIONAL (safe if DEFAULT_PRECISION stays 6)

* ~100+ time(6)/TimeType(6) samples, current_time(6) name checks, 
sql-expression-schema.md, and
  PySpark tests referencing time(6). ArrowConvertersSuite is already 
nanosecond-aware.

Note: this list assumes DEFAULT_PRECISION remains 6 (only MAX_PRECISION moves 
to 9). Changing the
default would additionally churn the informational set.

  was:
h2. What

Extend the fractional-seconds precision of the {{TIME}} data type from the 
current
maximum of 6 (microseconds) to 9 (nanoseconds). After this change {{TIME(p)}} 
accepts
{{0 <= p <= 9}}.

h2. Why

* Internal storage is *already* nanoseconds-since-midnight ({{Long}}), 
introduced by
  SPARK-52460. {{TimeType.NANOS_PRECISION = 9}} is already defined; only the cap
  {{TimeType.MAX_PRECISION = 6}} prevents using it.
* ANSI SQL (ISO/IEC 9075-2, 6.1 <data type>) makes the maximum {{<time 
precision>}}
  implementation-defined with the sole constraint that it is *not less than 6*, 
and
  Syntax Rule 36 requires the maximum of {{<time precision>}} and {{<timestamp 
precision>}}
  to be *the same* implementation-defined value.
* This worktree already supports nanosecond timestamps via 
{{TimestampNTZNanosType}} /
  {{TimestampLTZNanos}} (precision 7..9). To stay ANSI-consistent, {{TIME}} 
must reach
  precision 9 in lockstep.

h2. Scope

* Lift {{TimeType.MAX_PRECISION}} from 6 to 9 and update precision validation in
  {{TimeType}} and {{DataTypeAstBuilder}}.
* Update {{SparkDateTimeUtils.truncateTimeToPrecision}} (and its {{<= 
MAX_PRECISION}}
  assertion) to support p in 7..9.
* Time formatters/parsers ({{TimeFormatter}}, {{FractionTimeFormatter}}) must 
format and
  parse 7..9 fractional digits.
* Parquet I/O: the writer currently emits the {{TIME(MICROS)}} logical type; 
emit
  {{TIME(NANOS)}} for p in 7..9 and read it back ({{TimeTypeParquetOps}},
  {{ParquetSchemaConverter}}, {{ParquetWriteSupport}}, {{ParquetRowConverter}},
  vectorized reader).
* Verify casts already implemented for TIME (TIME(p1)->TIME(p2), TIME->DECIMAL,
  TIME->integral, STRING<->TIME) behave correctly for p in 7..9.

h2. Out of scope

* Casts to/from TIMESTAMP types (tracked separately).
* TIME WITH TIME ZONE (non-goal per SPARK-51162).

h2. Acceptance criteria

* {{TIME(7)}}, {{TIME(8)}}, {{TIME(9)}} can be declared, parsed, and used as 
literals.
* Round-trip through Parquet preserves nanosecond values.
* Existing TIME tests pass; new tests cover the 7..9 range.


> Extend the TIME data type precision to nanoseconds (up to 9)
> ------------------------------------------------------------
>
>                 Key: SPARK-57551
>                 URL: https://issues.apache.org/jira/browse/SPARK-57551
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 4.3.0
>            Reporter: Max Gekk
>            Priority: Major
>
> h2. What
> Extend the fractional-seconds precision of the {{TIME}} data type from the 
> current
> maximum of 6 (microseconds) to 9 (nanoseconds). After this change {{TIME(p)}} 
> accepts
> {{0 <= p <= 9}}.
> h2. Why
> * Internal storage is *already* nanoseconds-since-midnight ({{Long}}), 
> introduced by
>   SPARK-52460. {{TimeType.NANOS_PRECISION = 9}} is already defined; only the 
> cap
>   {{TimeType.MAX_PRECISION = 6}} prevents using it.
> * ANSI SQL (ISO/IEC 9075-2, 6.1 <data type>) makes the maximum {{<time 
> precision>}}
>   implementation-defined with the sole constraint that it is *not less than 
> 6*, and
>   Syntax Rule 36 requires the maximum of {{<time precision>}} and 
> {{<timestamp precision>}}
>   to be *the same* implementation-defined value.
> * This worktree already supports nanosecond timestamps via 
> {{TimestampNTZNanosType}} /
>   {{TimestampLTZNanos}} (precision 7..9). To stay ANSI-consistent, {{TIME}} 
> must reach
>   precision 9 in lockstep.
> h2. Scope
> * Lift {{TimeType.MAX_PRECISION}} from 6 to 9 and update precision validation 
> in
>   {{TimeType}} and {{DataTypeAstBuilder}}.
> * Update {{SparkDateTimeUtils.truncateTimeToPrecision}} (and its {{<= 
> MAX_PRECISION}}
>   assertion) to support p in 7..9.
> * Time formatters/parsers ({{TimeFormatter}}, {{FractionTimeFormatter}}) must 
> format and
>   parse 7..9 fractional digits.
> * Parquet I/O: the writer currently emits the {{TIME(MICROS)}} logical type; 
> emit
>   {{TIME(NANOS)}} for p in 7..9 and read it back ({{TimeTypeParquetOps}},
>   {{ParquetSchemaConverter}}, {{ParquetWriteSupport}}, 
> {{ParquetRowConverter}},
>   vectorized reader).
> * Verify casts already implemented for TIME (TIME(p1)->TIME(p2), 
> TIME->DECIMAL,
>   TIME->integral, STRING<->TIME) behave correctly for p in 7..9.
> h2. Out of scope
> * Casts to/from TIMESTAMP types (tracked separately).
> * TIME WITH TIME ZONE (non-goal per SPARK-51162).
> h2. Acceptance criteria
> * {{TIME(7)}}, {{TIME(8)}}, {{TIME(9)}} can be declared, parsed, and used as 
> literals.
> * Round-trip through Parquet preserves nanosecond values.
> * Existing TIME tests pass; new tests cover the 7..9 range.
> h2. Test impact (max precision 6 -> 9)
> Tests that hard-code precision 6 will need updating. The breaking ones 
> (assert >6 is invalid)
> should be fixed in this ticket; broader 7-9 coverage is tracked by 
> SPARK-57563.
> h3. MUST-UPDATE (assert precision > 6 is invalid; will fail/flip)
> * DataTypeParserSuite.scala (test "unsupported precision of the time data 
> type"): time(8)/time(9)
>   currently expect UNSUPPORTED_TIME_PRECISION -> become valid; move the 
> invalid case to time(10)
>   and add valid 7/8/9 parse cases.
> * DataTypeSuite.scala (test "Parse time(n) as TimeType(n)"): extend the {{0 
> to 6}} loop to 0..9;
>   {{DataType.fromJson("time(9)")}} expects INVALID_JSON_DATA_TYPE -> must 
> parse; move invalid JSON
>   to time(10). (The {{MAX_PRECISION + 1}} invalid-range check auto-adjusts.)
> * TimeExpressionsSuite.scala (CurrentTime range check, ~lines 318-327): 
> expected
>   valueRange "[0, 6]" (from MICROS_PRECISION) -> "[0, 9]"; also switch the 
> production current_time
>   precision check from MICROS_PRECISION to MAX_PRECISION and add valid 
> current_time(7/8/9).
> h3. MUST-UPDATE (enumerate 0..6 as "all precisions"; won't error but miss the 
> new range)
> * TimeFunctionsSuiteBase.scala (current_time {{(0 to 6)}} loop).
> * AvroSuite.scala / AvroFunctionsSuite.scala (precision 0-6 loops; 
> time-micros logical type).
> * OrcQuerySuite.scala (TIME(0)..TIME(6) casts + {{0 to 6}} assert loop).
> * from_/to_ function suites with testData precisions 0-6: CsvFunctionsSuite, 
> JsonFunctionsSuite,
>   CsvExpressionsSuite, JsonExpressionsSuite, XmlExpressionsSuite, 
> XmlFunctionsSuite.
> h3. LIKELY-UPDATE (pass today; need 7-9 cases / nanosecond expectations)
> * Generators capped at micros: LiteralGenerator.scala, 
> RandomDataGenerator.scala,
>   DateTimeTestUtils.localTime(..., micros).
> * TimeFormatterSuite (HH:mm:ss.SSSSSS, 999999, "TIME(6)" error text) and 
> DateTimeUtilsSuite cast
>   error text.
> * CastSuiteBase (TIME(p1)->TIME(p2), TIME->DECIMAL): loops auto-expand via 
> MAX_PRECISION but input
>   values are micros-only; add 7-9 fractional cases.
> * Parquet/ORC/Avro micros assumptions: ParquetIOSuite, 
> TimeTypeParquetOpsSuite (INT64 TIME(MICROS)
>   -> needs TIME(NANOS) for 7-9), AvroSuite, PartitionedWriteSuite.
> * Loops named via MICROS_PRECISION that mean "all valid precisions" -> switch 
> to MAX_PRECISION:
>   TimeExpressionsSuite, RowJsonSuite, DataTypeTestUtils (timeTypes), 
> SparkConnectPlannerSuite.
> * SQL golden: sql-tests/inputs/time.sql + results/time.sql.out 
> nanosecond-input truncation cases
>   (regenerate via SQLQueryTestSuite). Tracked in SPARK-57563.
> h3. INFORMATIONAL (safe if DEFAULT_PRECISION stays 6)
> * ~100+ time(6)/TimeType(6) samples, current_time(6) name checks, 
> sql-expression-schema.md, and
>   PySpark tests referencing time(6). ArrowConvertersSuite is already 
> nanosecond-aware.
> Note: this list assumes DEFAULT_PRECISION remains 6 (only MAX_PRECISION moves 
> to 9). Changing the
> default would additionally churn the informational set.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-57551) Extend the TIME data type precision to nanoseconds (up to 9)

Reply via email to