[ 
https://issues.apache.org/jira/browse/SPARK-57164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-57164.
------------------------------
    Fix Version/s: 4.3.0
       Resolution: Fixed

Issue resolved by pull request 56514
[https://github.com/apache/spark/pull/56514]

> Add parser test coverage for nanosecond-capable timestamp types across all 
> data-type string entry points
> --------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-57164
>                 URL: https://issues.apache.org/jira/browse/SPARK-57164
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL, Tests
>    Affects Versions: 4.3.0
>            Reporter: Max Gekk
>            Assignee: Max Gekk
>            Priority: Minor
>              Labels: pull-request-available, starter
>             Fix For: 4.3.0
>
>
> h2. What
> Add focused test coverage asserting that the nanosecond-capable timestamp
> spellings ({{TIMESTAMP_NTZ(p)}}, {{TIMESTAMP_LTZ(p)}}, and the
> {{TIMESTAMP(p) WITH[OUT] [LOCAL] TIME ZONE}} aliases, p in [7, 9]) parse
> consistently across every public string-to-DataType entry point, and that
> out-of-range precisions are rejected identically everywhere.
> h2. Why
> This is a sub-task of SPARK-56822 (SPIP: Timestamps with nanosecond 
> precision).
> Spark parses data-type strings through two independent parser families:
> * *Family A - ANTLR {{DataTypeAstBuilder}}*: the bare/zoned {{TIMESTAMP(p)}}
>   handling lives in one place, but it is reached through many distinct public
>   surfaces (see below). Each surface is a separate user-facing contract.
> * *Family B - JSON {{nameToType}}* in {{DataType.scala}}: a second,
>   hand-maintained parser with its own {{TIMESTAMP_LTZ_NANOS_TYPE}} /
>   {{TIMESTAMP_NTZ_NANOS_TYPE}} regex branches. This is where precision/error
>   semantics can silently drift from Family A.
> Today the nanos parsing is exercised mainly via
> {{CatalystSqlParser.parseDataType}} in {{DataTypeParserSuite}}. The other 
> public
> entry points have no explicit assertions, so a regression on any one of them
> (or drift between Family A and Family B) would go unnoticed.
> h2. Entry points to cover
> Family A (ANTLR {{DataTypeAstBuilder}}):
> * {{DataType.fromDDL}} and {{StructType.fromDDL}}
> * {{StructType.add(name, "TIMESTAMP_NTZ(9)")}}
> * {{Column.cast(String)}} and {{Column.try_cast(String)}}
> * {{DataFrameReader.schema(String)}} (and {{DataStreamReader.schema(String)}})
> * {{SparkSession.sessionState.sqlParser.parseDataType(String)}} - the 
> programmatic
>   catalog-string entry point
> * DDL/SQL schema strings passed to {{from_json}} / {{from_csv}} / {{from_xml}}
>   (XML is a built-in datasource; {{from_xml}} takes a schema string just like 
> the
>   other two)
> * SQL via the full {{AstBuilder}}: {{CAST(x AS TIMESTAMP_NTZ(9))}},
>   {{TRY_CAST(x AS TIMESTAMP_LTZ(9))}}, {{CREATE TABLE ... c 
> TIMESTAMP_LTZ(7)}},
>   {{ALTER TABLE ... ADD COLUMNS (c TIMESTAMP_NTZ(9))}}, {{ALTER TABLE ... 
> ALTER COLUMN}},
>   and a column {{DEFAULT}} declared with a nanos type
> Shared wrapper (bridges Family A and Family B):
> * {{DataType.parseTypeWithFallback}} - the DDL-then-JSON fallback used by
>   {{DataFrameReader.schema(String)}} and the {{from_*}} expressions. 
> Asserting it
>   directly is the single best guard against Family A and Family B drifting.
> Family B (JSON):
> * {{DataType.fromJson}} / {{DataTypeJsonUtils}} round-trip
>   ({{typeName}}/{{json}} <-> {{DataType}})
> h2. Acceptance criteria
> * For p in {7, 8, 9}, every entry point above resolves:
> ** {{TIMESTAMP_NTZ(p)}} -> {{TimestampNTZNanosType(p)}}
> ** {{TIMESTAMP_LTZ(p)}} -> {{TimestampLTZNanosType(p)}}
> ** {{TIMESTAMP(p) WITHOUT TIME ZONE}} -> {{TimestampNTZNanosType(p)}}
> ** {{TIMESTAMP(p) WITH LOCAL TIME ZONE}} -> {{TimestampLTZNanosType(p)}}
> ** {{TIMESTAMP(p)}} (bare) -> {{TimestampLTZNanosType(p)}} or 
> {{TimestampNTZNanosType(p)}}
>    depending on {{spark.sql.timestampType}} (assert both config values)
> * All entry points reject out-of-range precision (e.g. {{(6)}}, {{(10)}})
>   with {{INVALID_TIMESTAMP_PRECISION}}, with identical parameters across
>   Family A and Family B. (If the separate {{TIMESTAMP_*(6)}} mapping task has
>   landed, update the {{(6)}} expectations to the microsecond types instead.)
> * All entry points reject the spellings with {{FEATURE_NOT_ENABLED}} when
>   {{spark.sql.timestampNanosTypes.enabled = false}}.
> * A round-trip test confirms Family B agrees with Family A:
>   {{DataType.fromJson(t.json)}} == {{t}} for the nanos types, and the
>   {{typeName}} of a nanos type re-parses to the same type.
> h2. Where to add tests
> * {{sql/catalyst/.../parser/DataTypeParserSuite.scala}} - {{fromDDL}},
>   {{StructType.fromDDL}}, {{StructType.add(String)}}.
> * {{sql/catalyst/.../types/DataTypeSuite.scala}} - {{fromJson}}/{{json}}
>   round-trip (Family B).
> * {{Column.cast(String)}} / {{DataFrameReader.schema(String)}} /
>   {{from_json}} / {{from_csv}} / {{from_xml}} DDL-schema cases in the 
> appropriate
>   {{sql/core}} suite (gated by the preview flag via {{withSQLConf}}).
> * {{DataType.parseTypeWithFallback}} direct assertions (DDL path and JSON 
> fallback).
> h2. Out of scope
> * Behavior changes. This task only adds assertions for the current contract
>   (any intended behavior change for {{p}} = 6 is handled by its own task).
> * Spark Connect proto conversion (tracked separately under SPARK-57160 /
>   SPARK-57161).
> * Related parse entry points that flow through the same parser but whose
>   datasources reject nanos today; covered by their own tasks: JDBC 
> {{customSchema}}
>   option (SPARK-57460), ORC catalyst-type attribute round-trip (SPARK-57455), 
> and
>   Hive metastore type strings.
> h2. Notes for first-time contributors
> Good first issue - test-only, no production code changes. Enable the preview
> flag in tests with:
> {code}
> withSQLConf(SQLConf.TIMESTAMP_NANOS_TYPES_ENABLED.key -> "true") { ... }
> {code}
> Run an affected suite with SBT:
> {code}
> build/sbt 'catalyst/testOnly *DataTypeParserSuite *DataTypeSuite'
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to