[
https://issues.apache.org/jira/browse/SPARK-57551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Max Gekk reassigned SPARK-57551:
--------------------------------
Assignee: Max Gekk
> Extend the TIME data type precision to nanoseconds (up to 9)
> ------------------------------------------------------------
>
> Key: SPARK-57551
> URL: https://issues.apache.org/jira/browse/SPARK-57551
> Project: Spark
> Issue Type: Sub-task
> Components: SQL
> Affects Versions: 4.3.0
> Reporter: Max Gekk
> Assignee: Max Gekk
> Priority: Major
> Labels: pull-request-available
>
> h2. What
> Extend the fractional-seconds precision of the {{TIME}} data type from the
> current
> maximum of 6 (microseconds) to 9 (nanoseconds). After this change {{TIME(p)}}
> accepts
> {{0 <= p <= 9}}.
> h2. Why
> * Internal storage is *already* nanoseconds-since-midnight ({{Long}}),
> introduced by
> SPARK-52460. {{TimeType.NANOS_PRECISION = 9}} is already defined; only the
> cap
> {{TimeType.MAX_PRECISION = 6}} prevents using it.
> * ANSI SQL (ISO/IEC 9075-2, 6.1 <data type>) makes the maximum {{<time
> precision>}}
> implementation-defined with the sole constraint that it is *not less than
> 6*, and
> Syntax Rule 36 requires the maximum of {{<time precision>}} and
> {{<timestamp precision>}}
> to be *the same* implementation-defined value.
> * This worktree already supports nanosecond timestamps via
> {{TimestampNTZNanosType}} /
> {{TimestampLTZNanos}} (precision 7..9). To stay ANSI-consistent, {{TIME}}
> must reach
> precision 9 in lockstep.
> h2. Scope
> * Lift {{TimeType.MAX_PRECISION}} from 6 to 9 and update precision validation
> in
> {{TimeType}} and {{DataTypeAstBuilder}}.
> * Update {{SparkDateTimeUtils.truncateTimeToPrecision}} (and its {{<=
> MAX_PRECISION}}
> assertion) to support p in 7..9.
> * Time formatters/parsers ({{TimeFormatter}}, {{FractionTimeFormatter}}) must
> format and
> parse 7..9 fractional digits.
> * Parquet I/O: the writer currently emits the {{TIME(MICROS)}} logical type;
> emit
> {{TIME(NANOS)}} for p in 7..9 and read it back ({{TimeTypeParquetOps}},
> {{ParquetSchemaConverter}}, {{ParquetWriteSupport}},
> {{ParquetRowConverter}},
> vectorized reader).
> * Verify casts already implemented for TIME (TIME(p1)->TIME(p2),
> TIME->DECIMAL,
> TIME->integral, STRING<->TIME) behave correctly for p in 7..9.
> h2. Out of scope
> * Casts to/from TIMESTAMP types (tracked separately).
> * TIME WITH TIME ZONE (non-goal per SPARK-51162).
> h2. Acceptance criteria
> * {{TIME(7)}}, {{TIME(8)}}, {{TIME(9)}} can be declared, parsed, and used as
> literals.
> * Round-trip through Parquet preserves nanosecond values.
> * Existing TIME tests pass; new tests cover the 7..9 range.
> h2. Test impact (max precision 6 -> 9)
> Tests that hard-code precision 6 will need updating. The breaking ones
> (assert >6 is invalid)
> should be fixed in this ticket; broader 7-9 coverage is tracked by
> SPARK-57563.
> h3. MUST-UPDATE (assert precision > 6 is invalid; will fail/flip)
> * DataTypeParserSuite.scala (test "unsupported precision of the time data
> type"): time(8)/time(9)
> currently expect UNSUPPORTED_TIME_PRECISION -> become valid; move the
> invalid case to time(10)
> and add valid 7/8/9 parse cases.
> * DataTypeSuite.scala (test "Parse time(n) as TimeType(n)"): extend the {{0
> to 6}} loop to 0..9;
> {{DataType.fromJson("time(9)")}} expects INVALID_JSON_DATA_TYPE -> must
> parse; move invalid JSON
> to time(10). (The {{MAX_PRECISION + 1}} invalid-range check auto-adjusts.)
> * TimeExpressionsSuite.scala (CurrentTime range check, ~lines 318-327):
> expected
> valueRange "[0, 6]" (from MICROS_PRECISION) -> "[0, 9]"; also switch the
> production current_time
> precision check from MICROS_PRECISION to MAX_PRECISION and add valid
> current_time(7/8/9).
> h3. MUST-UPDATE (enumerate 0..6 as "all precisions"; won't error but miss the
> new range)
> * TimeFunctionsSuiteBase.scala (current_time {{(0 to 6)}} loop).
> * AvroSuite.scala / AvroFunctionsSuite.scala (precision 0-6 loops;
> time-micros logical type).
> * OrcQuerySuite.scala (TIME(0)..TIME(6) casts + {{0 to 6}} assert loop).
> * from_/to_ function suites with testData precisions 0-6: CsvFunctionsSuite,
> JsonFunctionsSuite,
> CsvExpressionsSuite, JsonExpressionsSuite, XmlExpressionsSuite,
> XmlFunctionsSuite.
> h3. LIKELY-UPDATE (pass today; need 7-9 cases / nanosecond expectations)
> * Generators capped at micros: LiteralGenerator.scala,
> RandomDataGenerator.scala,
> DateTimeTestUtils.localTime(..., micros).
> * TimeFormatterSuite (HH:mm:ss.SSSSSS, 999999, "TIME(6)" error text) and
> DateTimeUtilsSuite cast
> error text.
> * CastSuiteBase (TIME(p1)->TIME(p2), TIME->DECIMAL): loops auto-expand via
> MAX_PRECISION but input
> values are micros-only; add 7-9 fractional cases.
> * Parquet/ORC/Avro micros assumptions: ParquetIOSuite,
> TimeTypeParquetOpsSuite (INT64 TIME(MICROS)
> -> needs TIME(NANOS) for 7-9), AvroSuite, PartitionedWriteSuite.
> * Loops named via MICROS_PRECISION that mean "all valid precisions" -> switch
> to MAX_PRECISION:
> TimeExpressionsSuite, RowJsonSuite, DataTypeTestUtils (timeTypes),
> SparkConnectPlannerSuite.
> * SQL golden: sql-tests/inputs/time.sql + results/time.sql.out
> nanosecond-input truncation cases
> (regenerate via SQLQueryTestSuite). Tracked in SPARK-57563.
> h3. INFORMATIONAL (safe if DEFAULT_PRECISION stays 6)
> * ~100+ time(6)/TimeType(6) samples, current_time(6) name checks,
> sql-expression-schema.md, and
> PySpark tests referencing time(6). ArrowConvertersSuite is already
> nanosecond-aware.
> Note: this list assumes DEFAULT_PRECISION remains 6 (only MAX_PRECISION moves
> to 9). Changing the
> default would additionally churn the informational set.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]