[ 
https://issues.apache.org/jira/browse/SPARK-57250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uroš Bojanić resolved SPARK-57250.
----------------------------------
    Resolution: Fixed

> Construct sub-microsecond timestamp typed literals with precision derived 
> from fractional digits
> ------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-57250
>                 URL: https://issues.apache.org/jira/browse/SPARK-57250
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 4.3.0
>            Reporter: Max Gekk
>            Assignee: Max Gekk
>            Priority: Major
>              Labels: pull-request-available
>
> h2. What
> Make the SQL typed-literal constructor in {{AstBuilder.visitTypeConstructor}} 
> produce
> nanosecond-capable timestamp literals when the literal string carries 7-9 
> fractional-second
> digits, instead of truncating to a microsecond {{Long}}. Following the ANSI 
> SQL rule, the
> fractional-second precision {{p}} of the literal is *derived from the number 
> of digits in the
> fractional part of the string* - there is no precision argument in the 
> literal syntax.
> Concretely, for a typed literal {{[TYPE] '<value>'}}:
> * 0-6 fractional digits -> unchanged: a microsecond literal of 
> {{TimestampType}} /
>   {{TimestampNTZType}} (current behavior; aligned with SPARK-57163).
> * 7-9 fractional digits -> a nanos literal:
>   {{Literal(TimestampNTZNanos(epochMicros, nanosWithinMicro), 
> TimestampNTZNanosType(p))}}
>   or the LTZ counterpart, where {{p}} is the fractional digit count.
> * more than 9 fractional digits -> a user-facing error (precision exceeds the 
> maximum of 9).
> Parent: SPARK-56822. Builds on SPARK-56876 (logical types), SPARK-56965 
> (parser names the
> types), SPARK-56981 (physical row storage), SPARK-57032 (string parsing for 
> 7-9 digits), and
> SPARK-57033 (java.time / composite conversion). Complements SPARK-57163 (p<=6 
> -> micro mapping)
> and the test-side SPARK-57165 (LiteralGenerator).
> h2. Why
> The parser can already *name* {{TIMESTAMP_NTZ(p)}} / {{TIMESTAMP_LTZ(p)}} 
> types
> (SPARK-56965), but a typed *literal value* with sub-microsecond precision is 
> still silently
> narrowed to micros. Today {{visitTypeConstructor}} routes {{TIMESTAMP}} / 
> {{TIMESTAMP_NTZ}} /
> {{TIMESTAMP_LTZ}} through {{stringToTimestamp}} / 
> {{stringToTimestampWithoutTimeZone}}, which
> return epoch-microsecond {{Long}}s, so {{TIMESTAMP_NTZ '2020-01-01 
> 00:00:00.123456789'}} loses
> digits 7-9. Typed literals are the most direct way for users to introduce 
> constant nanos
> values in SQL (filters, INSERT ... VALUES, constant folding), so this 
> unblocks the minimum
> user-visible SQL surface for the preview feature.
> h2. ANSI SQL behavior to follow
> Per ISO/IEC 9075-2 (SQL/Foundation):
> * Grammar: {{<timestamp literal> ::= TIMESTAMP <timestamp string>}}, where 
> the fractional part
>   is {{<seconds fraction> ::= <unsigned integer>}} (no fixed digit limit, and 
> *no precision
>   token in the literal*).
> * Subclause 5.3, Syntax Rule 27: the declared type of a {{<timestamp 
> literal>}} is
>   {{TIMESTAMP(P)}}, "where P is the number of digits in <seconds fraction>". 
> WITH vs WITHOUT
>   TIME ZONE is determined by whether the string contains a {{<time zone 
> interval>}} (offset).
> * Subclause 6.1, Syntax Rules 34 and 36: default {{<timestamp precision>}} is 
> 6; the maximum is
>   implementation-defined and "not less than 6". Spark caps it at 9 
> (nanoseconds).
> This matches Spark's existing literal grammar, which has no precision 
> argument:
> {{literalType singleStringLitWithoutMarker}} with {{literalType}} one of
> {{TIMESTAMP | TIMESTAMP_NTZ | TIMESTAMP_LTZ}}. So no grammar change is 
> required; only the
> value-construction side of {{visitTypeConstructor}} changes.
> h2. Scope
> {{sql/catalyst/.../parser/AstBuilder.scala}} ({{visitTypeConstructor}}):
> * Determine the fractional digit count of the literal string and pick micro 
> vs nanos accordingly.
> * For 7-9 digits, build {{TimestampNTZNanos}} / {{TimestampLTZNanos}} values 
> via the
>   SPARK-57032 nanos parse helpers and wrap them in {{Literal(..., 
> TimestampNTZNanosType(p))}} /
>   {{TimestampLTZNanosType(p)}}.
> * Keep the existing NTZ/LTZ resolution for the bare {{TIMESTAMP}} keyword: 
> when the string
>   contains a time-zone offset, resolve to the LTZ (WITH TIME ZONE) variant; 
> otherwise NTZ. The
>   explicit {{TIMESTAMP_NTZ}} / {{TIMESTAMP_LTZ}} keywords pin the variant.
> * Validate precision: more than 9 fractional digits raises the existing 
> user-facing
>   invalid-precision / cannot-parse error.
> * Gate behind {{spark.sql.timestampNanosTypes.enabled}}. When the flag is 
> off, retain current
>   microsecond behavior (parse and narrow at 6 digits) exactly as today.
> * Reuse {{convertSpecialTimestamp}} / {{convertSpecialTimestampNTZ}} for 
> special values (these
>   remain micro).
> h2. Out of scope
> * Custom-format / pattern-based literal parsing (covered by the nanos 
> TimestampFormatter,
>   SPARK-57162).
> * Casts and type coercion for nanos types (separate cast-matrix and coercion 
> subtasks).
> * {{LiteralGenerator}} test support (SPARK-57165).
> * Any new literal syntax with an explicit precision token (e.g. 
> {{TIMESTAMP_NTZ(9) '...'}}) -
>   not ANSI and not in scope.
> h2. Design notes
> * Precision is *implied*, never declared, in the literal - count the digits 
> after the decimal
>   point in the seconds field.
> * {{TimestampNTZNanosType}} / {{TimestampLTZNanosType}} require precision in 
> [7, 9]; literals
>   with 0-6 fractional digits therefore cannot be nanos types and stay micro 
> by construction.
> * Reuse the {{TimestampNanosVal}} normalization invariant 
> ({{nanosWithinMicro}} in [0, 999]).
> * No rounding of sub-precision digits is needed here because {{p}} equals the 
> exact digit count;
>   truncation/rounding rules belong to cast/format paths.
> h2. How was this patch tested
> * New cases in the parser/expression literal suites: parse round-trip for 
> {{TIMESTAMP_NTZ}} /
>   {{TIMESTAMP_LTZ}} / {{TIMESTAMP}} literals with 7, 8, and 9 fractional 
> digits; assert the
>   resulting {{Literal}} dataType is {{TimestampNTZNanosType(p)}} / 
> {{TimestampLTZNanosType(p)}}
>   and the composite value matches a java.time oracle (SPARK-57033 helpers).
> * Boundary cases: {{nanosWithinMicro}} 0 and 999, pre-epoch instants (e.g. 
> 1582 cutover),
>   9999-12-31, and exactly 6 digits (must stay micro).
> * Negative: more than 9 fractional digits raises the expected error; behavior 
> with the preview
>   flag disabled is unchanged.
> * TIMESTAMP-keyword NTZ-vs-LTZ resolution based on presence of a time-zone 
> offset.
> h2. Does this PR introduce any user-facing change
> Yes, but gated. When {{spark.sql.timestampNanosTypes.enabled}} is true, typed 
> timestamp literals
> with 7-9 fractional digits resolve to nanosecond-capable types instead of 
> being narrowed to
> microseconds, e.g.:
> {code:sql}
> SELECT TIMESTAMP_NTZ '2020-01-01 00:00:00.123456789';  -- 
> TimestampNTZNanosType(9)
> {code}
> When the flag is false (production default), behavior is unchanged.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to