Max Gekk created SPARK-57164:
--------------------------------
Summary: Add parser test coverage for nanosecond-capable timestamp
types across all data-type string entry points
Key: SPARK-57164
URL: https://issues.apache.org/jira/browse/SPARK-57164
Project: Spark
Issue Type: Sub-task
Components: SQL, Tests
Affects Versions: 4.3.0
Reporter: Max Gekk
h2. What
Add focused test coverage asserting that the nanosecond-capable timestamp
spellings ({{TIMESTAMP_NTZ(p)}}, {{TIMESTAMP_LTZ(p)}}, and the
{{TIMESTAMP(p) WITH[OUT] [LOCAL] TIME ZONE}} aliases, p in [7, 9]) parse
consistently across every public string-to-DataType entry point, and that
out-of-range precisions are rejected identically everywhere.
h2. Why
This is a sub-task of SPARK-56822 (SPIP: Timestamps with nanosecond precision).
Spark parses data-type strings through two independent parser families:
* *Family A - ANTLR {{DataTypeAstBuilder}}*: the bare/zoned {{TIMESTAMP(p)}}
handling lives in one place, but it is reached through many distinct public
surfaces (see below). Each surface is a separate user-facing contract.
* *Family B - JSON {{nameToType}}* in {{DataType.scala}}: a second,
hand-maintained parser with its own {{TIMESTAMP_LTZ_NANOS_TYPE}} /
{{TIMESTAMP_NTZ_NANOS_TYPE}} regex branches. This is where precision/error
semantics can silently drift from Family A.
Today the nanos parsing is exercised mainly via
{{CatalystSqlParser.parseDataType}} in {{DataTypeParserSuite}}. The other public
entry points have no explicit assertions, so a regression on any one of them
(or drift between Family A and Family B) would go unnoticed.
h2. Entry points to cover
Family A (ANTLR {{DataTypeAstBuilder}}):
* {{DataType.fromDDL}} and {{StructType.fromDDL}}
* {{StructType.add(name, "TIMESTAMP_NTZ(9)")}}
* {{Column.cast(String)}} and {{Column.try_cast(String)}}
* {{DataFrameReader.schema(String)}} (and {{DataStreamReader.schema(String)}})
* DDL/SQL schema strings passed to {{from_json}} / {{from_csv}}
* SQL via the full {{AstBuilder}}: {{CAST(x AS TIMESTAMP_NTZ(9))}},
{{CREATE TABLE ... c TIMESTAMP_LTZ(7)}}
Family B (JSON):
* {{DataType.fromJson}} / {{DataTypeJsonUtils}} round-trip
({{typeName}}/{{json}} <-> {{DataType}})
h2. Acceptance criteria
* For p in {7, 8, 9}, every entry point above resolves:
** {{TIMESTAMP_NTZ(p)}} -> {{TimestampNTZNanosType(p)}}
** {{TIMESTAMP_LTZ(p)}} -> {{TimestampLTZNanosType(p)}}
** {{TIMESTAMP(p) WITHOUT TIME ZONE}} -> {{TimestampNTZNanosType(p)}}
** {{TIMESTAMP(p) WITH LOCAL TIME ZONE}} -> {{TimestampLTZNanosType(p)}}
* All entry points reject out-of-range precision (e.g. {{(6)}}, {{(10)}})
with {{INVALID_TIMESTAMP_PRECISION}}, with identical parameters across
Family A and Family B. (If the separate {{TIMESTAMP_*(6)}} mapping task has
landed, update the {{(6)}} expectations to the microsecond types instead.)
* All entry points reject the spellings with {{FEATURE_NOT_ENABLED}} when
{{spark.sql.timestampNanosTypes.enabled = false}}.
* A round-trip test confirms Family B agrees with Family A:
{{DataType.fromJson(t.json)}} == {{t}} for the nanos types, and the
{{typeName}} of a nanos type re-parses to the same type.
h2. Where to add tests
* {{sql/catalyst/.../parser/DataTypeParserSuite.scala}} - {{fromDDL}},
{{StructType.fromDDL}}, {{StructType.add(String)}}.
* {{sql/catalyst/.../types/DataTypeSuite.scala}} - {{fromJson}}/{{json}}
round-trip (Family B).
* {{Column.cast(String)}} / {{DataFrameReader.schema(String)}} /
{{from_json}} DDL-schema cases in the appropriate {{sql/core}} suite
(gated by the preview flag via {{withSQLConf}}).
h2. Out of scope
* Behavior changes. This task only adds assertions for the current contract
(any intended behavior change for {{p}} = 6 is handled by its own task).
* Spark Connect proto conversion (tracked separately under SPARK-57160 /
SPARK-57161).
h2. Notes for first-time contributors
Good first issue - test-only, no production code changes. Enable the preview
flag in tests with:
{code}
withSQLConf(SQLConf.TIMESTAMP_NANOS_TYPES_ENABLED.key -> "true") { ... }
{code}
Run an affected suite with SBT:
{code}
build/sbt 'catalyst/testOnly *DataTypeParserSuite *DataTypeSuite'
{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]