[
https://issues.apache.org/jira/browse/SPARK-56822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Max Gekk updated SPARK-56822:
-----------------------------
Description:
h1. Q1. What are you trying to do? Articulate your objectives using absolutely
no jargon.
Add nanosecond-capable timestamps with an explicit fractional precision *n* in
SQL and APIs, while keeping a simple binary value: epoch microseconds (64-bit)
+ nanoseconds within that microsecond (0–999, stored in 16 bits).
h3. SQL surface
Support TIMESTAMP_NTZ(n), TIMESTAMP_LTZ(n), and TIMESTAMP(n) in the parser
(including equivalent spellings: WITHOUT TIME ZONE, WITH LOCAL TIME ZONE, etc.)
so precision is first-class in the grammar, not a side convention. The
parameter n is optional, and its valid range is [0, 9]. It defines how many
decimal digits of the fractional second are part of the type. For example: n =
6 -> microseconds, n = 9 -> nanoseconds.
h3. Scala / Java APIs
Introduce parameterized catalyst types, e.g. TimestampNTZNanosType(n) for
TIMESTAMP_NTZ(n) and TimestampNanosType(n) for TIMESTAMP_LTZ(n).
h3. In-memory value
The internal representation of both types is long micros since the epoch +
short (or 16-bit) nanos-in-micro in [0, 999]. Both NTZ and LTZ have the same
representation; time zone only affects interpretation for LTZ, not the pair
layout.
h1. Q2. What problem is this proposal NOT designed to solve?
* *Precision below nanosecond-target range.* This SPIP only covers
high-precision parameterized timestamps where 7 <= n <= 9. Support/changes for
n < 7 are explicitly out of scope for this proposal.
* *Subtraction of timestamps.* The result should be a day-time interval but
the data type has microsecond precision at the moment.
* *Changing existing time zone semantics.* It does not redefine NTZ/LTZ
semantics, session time zone behavior, or SQL time zone rules.
* *A full rework of all connectors/storage formats.* It does not promise
end-to-end nanosecond support in every external system; connector-specific
follow-ups may still be needed.
h1. Q3. How is it done today, and what are the limits of current practice?
Today Spark SQL supports two built-in timestamp types, both at microsecond
precision:
* TimestampType (TIMESTAMP WITH LOCAL TIME ZONE)
* TimestampNTZType (TIMESTAMP WITHOUT TIME ZONE)
So Spark’s native timestamp model stops at 6 fractional digits.
For nanosecond data (for example Parquet TIMESTAMP(NANOS, ...)), current
behavior is limited:
* Default behavior: Spark rejects it with an analysis error (for example:
Illegal Parquet type: INT64 (TIMESTAMP(NANOS,true))).
* Legacy fallback (spark.sql.legacy.parquet.nanosAsLong=true): Spark reads it
as LongType (raw integer), which drops timestamp semantics:
** no timestamp/date function behavior
** no time zone semantics
** no timestamp type safety
In practice this means users either:
* down-convert data to microseconds before Spark, or
* keep nanosecond values as raw integers and do manual conversion logic, or
* switch to another engine for nanosecond timestamp workloads.
This is increasingly painful because nanosecond timestamps are common in data
produced by systems like Pandas/PyArrow, Trino, ClickHouse, and DuckDB, and in
domains such as market data, IoT telemetry, and tracing/observability pipelines.
h1. Q4. What is new in your approach and why do you think it will be successful?
h2. What is new
* *Precision parameterization for timestamp types*
Add TIMESTAMP_NTZ(n), TIMESTAMP_LTZ(n), and TIMESTAMP(n) (including alias
forms) with explicit fractional precision, focused on 7 <= n <= 9.
* *New catalyst/API types for nanos-capable timestamps*
Introduce parameterized nanos-aware types instead of overloading existing
microsecond-only types.
*
** For TIMESTAMP_NTZ(n): case class TimestampNTZNanosType(precision: Int)
*
** For TIMESTAMP_LTZ(n): case class TimestampLTZNanosType(precision: Int)
* *Precise internal value model without changing core epoch semantics.*
Represent each value as:
** epoch microseconds (long)
** nanoseconds within the microsecond (0..999) This preserves existing Spark
timestamp foundations while adding sub-micro precision.
* *End-to-end engine integration, not just parser support*
The change is wired through parser/type system, expression evaluation, codegen,
unsafe/container paths, and file-format handling paths needed for practical use.
h2. Why this should succeed
# Low semantic risk: extends existing timestamp families instead of redefining
them.
# Backward compatible path: existing microsecond types and behavior remain
unchanged.
# Incremental implementation: narrow scope (7..9) and clear boundaries make
rollout testable.
# Real interoperability value: directly addresses common nanos data sources
where Spark currently fails or degrades to LongType.
# Operationally practical: keeps compact representation and reuses current
execution architecture, so performance/regression risk is manageable.
h1. Q5. Who cares? If you are successful, what difference will it make?
h2. Who cares
* Data engineering teams ingesting Parquet/warehouse data produced with
nanosecond timestamps.
* Platform teams running mixed ecosystems (Spark +
Trino/ClickHouse/DuckDB/Pandas/PyArrow).
* Domain teams where sub-micro timing matters (market data, telemetry,
observability, IoT, CDC/event streams).
* Connector and table-format maintainers who currently need compatibility
workarounds.
h2. What difference success makes
# Interop works by default. Spark can read/use high-precision timestamp data
as timestamp types, instead of failing or forcing LongType fallback.
# No more semantic loss workarounds. Users avoid manual bigint-to-timestamp
conversions, custom UDF glue, and loss of timezone/type semantics.
# Correctness for high-frequency data. Distinct events within the same
microsecond stay distinct; ordering and time-window logic become more reliable
for nanos data.
# Lower migration friction to Spark. Teams can bring existing nanos datasets
into Spark pipelines without pre-normalization to micros.
# Cleaner long-term type story. Spark gets an explicit precision model for
timestamps (n) rather than implicit microsecond-only behavior, making schema
contracts clearer across SQL and APIs.
h1. Q6. What are the risks?
|Risk|Mitigation|
|*User confusion* from more timestamp spellings (TIMESTAMP(n),
TIMESTAMP_NTZ(n), TIMESTAMP_LTZ(n), plus WITHOUT TIME ZONE / WITH LOCAL TIME
ZONE variants).|Keep today’s types as the default; require explicit n for the
new behavior; document a small “cheat sheet” mapping SQL spelling -> semantics
+ precision.|
|{*}Wrong results in shared datetime code paths{*}: functions/rules that
implicitly assume microsecond-only timestamps (AnyTimestampType-style plumbing,
codegen, optimizer rewrites) may mishandle nanosecond-capable
values.|Systematic audit of shared abstractions (casts, comparisons, intervals,
extract, codegen paths) plus focused regression tests on n ∈ [7,9].|
|*Performance regressions* (wider projections, more branches in
codegen/vectorized paths).|Benchmark common scans/joins/aggregations; keep fast
paths for existing microsecond timestamps.|
|*Range / overflow issues tied to widening.* Bugs where code accidentally
converts to a single epochNanos long during promotion or builtins.|Document the
representable range for the chosen internal representation; enforce bounds at
cast boundaries; tests for edge instants.|
|*Interop / external formats (Parquet/Iceberg/etc.):* external encodings may
use epoch nanoseconds in int64, while Spark uses (micros, nanosWithinMicro).
Conversion bugs are likely near boundaries and for pushdown
predicates.|Conversions must be explicitly specified (including acceptable
rounding/truncation) and covered by tests for the supported read/write paths
you ship.|
h1. Q7. How long will it take?
This estimate covers shipping of parameterized nanosecond-capable timestamps (7
<= n <= 9) with feature parity to existing TimestampType / TimestampNTZType,
including parser, core type system, Parquet nanos support, encoders/converters,
datetime utilities, cast matrix (interpreted + codegen), expression updates,
type coercion, literals, and testing.
Implementation should integrate with the Types Framework (SPARK-53504):
register the new types through the centralized TypeOps / TypeApiOps (and
storage/client Ops as applicable) instead of scattering one-off integration
across dozens of files.
h3. Engineering estimate (person-weeks)
|Area|pw|
|Core type system (parameterization, physical dispatch for p<=6 vs p>=7,
equality/hash/order + catalyst plumbing)|2|
|SQL parser / AST (TIMESTAMP…(p) + aligned spellings)|1|
|Parquet nanos read/write (incl. vectorized + non-vectorized, rebasing, legacy
migration hooks, preview flags)|4|
|Encoders / converters (framework-aligned)|1.5|
|Datetime utils (nano parsing/conversion/arithmetic helpers)|1|
|Cast matrix (interpreted + codegen, precision rules)|2|
|Expression parity work (datetime builtins impacted)|5|
|Type coercion / widening (TypeCoercion / AnsiTypeCoercion)|3|
|Literals|1|
|Testing (unit + Parquet + overflow/range + ANSI vs non-ANSI)|2|
|Types Framework wiring + tests for the new type|1.5|
|Total|25.0|
Calendar time (rule of thumb) ~25 person-weeks is about:
* ~6 months for one senior engineer (calendar), or
* ~7–9 weeks wall time for two engineers mostly dedicated,
plus ~2–4 extra weeks calendar buffer for OSS review/CI churn and rebases.
h1. Q8. What are the mid-term and final “exams” to check for success?
h3. Mid-term “exams”. Users can declare, load, and analyze nanosecond-capable
timestamps without hacks:
# *SQL types are real, not experimental stubs.*
Users can declare schemas with TIMESTAMP…(p) / TIMESTAMP_NTZ(p) /
TIMESTAMP_LTZ(p) for p ∈ [7,9] (including equivalent WITHOUT TIME ZONE / WITH
LOCAL TIME ZONE spellings), and those types round-trip through CREATE TABLE /
CAST / DESCRIBE / explain plans in a predictable way.
# *Nanosecond timestamps stop degrading into “just BIGINT”.*
Users can read common nanosecond Parquet inputs as timestamps, without being
forced into LongType / legacy escape hatches for the supported paths you ship
in this milestone.
# *Everyday analytics works on the new types.*
For the shipped surface area, users can run typical workflows end-to-end:
filters/joins/group-by keys, casts, timestamp arithmetic,
extract/truncate-style operations.
# *Defaults unchanged for existing users.*
Existing microsecond-first workloads behave as today unless users explicitly
opt into p ∈ [7,9] or ingest nanosecond-native sources that trigger the new
behavior.
h3. Final “exams”. Workflows are consistently usable, documented, and
migration-ready at the level users expect from Spark timestamps today:
# *Parity exam vs existing Spark timestamps (user-visible).*
For TimestampType / TimestampNTZType, users already expect a broad set of
behaviors. For p ∈ [7,9], the shipped release must meet the same practical
usability standard on the supported operations list (no “half the functions
silently downgrade precision or fail inconsistently”).
# *Interop exam.*
Users can run realistic pipelines: read nanosecond-rich Parquet, transform, and
write/publish results without mandatory pre-processing outside Spark—within the
explicitly documented guarantees (lossless round-trip everywhere is not
required, but what is guaranteed must be testably true).
# *Migration exam.*
There is a published migration path away from workflows that today rely on
spark.sql.legacy.parquet.nanosAsLong (including what changes in schema types
and what users must do).
# *Documentation exam.*
Public docs answer: what p means, 7–9 scope, range/overflow behavior, ANSI vs
non-ANSI, and known limitations—written so support/engineering can deflect
repeated confusion.
h1. Appendix A. Proposed API Changes
h2. A.1 SQL language
Extend fractional-second precision (FSP) on timestamp families
|*Surface*|*Proposed syntax*|*Role*|
|NTZ nanos-capable|TIMESTAMP_NTZ(n)|Explicit NTZ with fractional precision
{*}n{*}.|
|Alias (NTZ)|TIMESTAMP(n) WITHOUT TIME ZONE|Same NTZ meaning as
TIMESTAMP_NTZ(n) (aligned with SQL-style spelling).|
|LTZ nanos-capable|TIMESTAMP_LTZ(n)|Explicit LTZ / “instant” timeline with n.|
|Alias (LTZ)|TIMESTAMP(n) WITH LOCAL TIME ZONE|Same LTZ meaning as
TIMESTAMP_LTZ(n).|
|Session default timestamp form|TIMESTAMP(n)|Where Spark today resolves
TIMESTAMP to LTZ or NTZ per configuration, n attaches FSP to that choice.|
Backward compatibility
* Unparameterized forms keep today’s meaning:
* TIMESTAMP_NTZ -> existing TimestampNTZType (microsecond semantics).
* TIMESTAMP_LTZ / TIMESTAMP WITHOUT TIME ZONE / WITH LOCAL TIME ZONE
combinations -> existing TimestampType / TimestampNTZType behavior unchanged
when (n) is omitted.
* New behavior appears only when (n) is present (and, per SPIP scope,
especially 7 <= n <= 9), mapping to the new catalyst types below - not silent
widening of old types.
h2. A.2 Scala / Java DataType APIs
New case classes:
|Type|Purpose|Notes|
|TimestampNTZNanosType(n: Int)|SQL TIMESTAMP_NTZ(n) (nanosecond-capable
NTZ).|Present on this branch: sql / typeName like TIMESTAMP_NTZ(n; n in 0..9
today in code—SPIP may narrow product scope to 7..9 while keeping storage
general.|
|TimestampNanosType(n: Int)|SQL TIMESTAMP_LTZ(n) (nanosecond-capable
LTZ).|Proposed companion to NTZ; mirrors the same FSP parameter and
external/Java mapping pattern.|
Backward compatibility
* Existing singleton types TimestampNTZType and TimestampType remain the
defaults for unparameterized SQL and for older serialized schemas that do not
carry n.
* New types are additive; code that pattern-matches only on TimestampType /
TimestampNTZType continues to compile but must be reviewed for parity paths
when TimestampNanosType / TimestampNTZNanosType appear.
h2. A.3 Compatibility summary
|Direction|Expectation|
|Old → new|Opt-in via TIMESTAMP_* (n) / new DataTypes; no change for legacy
microsecond types unless users choose nanosecond-capable types.|
|New → old|Downgrades may truncate/round per FSP rules; must be documented and
tested (ANSI throw vs non-ANSI null where applicable).|
|Cross-version|Schemas with nanosecond-capable types require a Spark version
that understands them; older engines must reject or require migration tools—not
silently coerce.|
h1. Appendix B. Design Sketch.
A value for nanosecond-capable NTZ (and the same pair for LTZ):
* {*}epochMicros{*}: Long — signed epoch microseconds (same grid as
TimestampType / TimestampNTZType today).
* {*}nanosOfMicro{*}: Short in [0, 999] — remaining nanoseconds inside that
microsecond bucket.
{*}Invariant{*}: the pair is always normalized so *nanosOfMicro* stays in
range; excess carries into *epochMicros* with Math.addExact / floor-div where
needed.
h4. Why this split (vs a single Long epoch-nanos counter):
* {*}Range{*}: *epochMicros* as Long keeps the calendar reach in the same
ballpark as today’s microsecond timestamps. A single INT64 epoch-nanoseconds
field has a much smaller representable year range; Spark can avoid that
user-visible cliff for the micros timeline part.
* *Interop/conversion cost:* most existing Spark datetime math is already
microsecond-grained; incremental changes upgrade paths by composing micro ops +
cheap nano remainder, instead of forcing every operator to immediately
normalize to nanoseconds-since-epoch.
* *Deterministic normalization:* *nanosOfMicro* is a small bounded correction
term - easy to audit in casts/parsers vs unconstrained nano arithmetic
everywhere.
TimestampNTZNanosType (and similar TimestampLTZNanosType) is the schema-level
description of TIMESTAMP_NTZ(p). The values for this type are not “a struct of
two children in the row” at runtime; the companion internalStructType exists
for metadata only (historical/compat hooks). Execution uses the normalized
logical pair epoch micros + nanoseconds within that micro via
org.apache.spark.unsafe.types.TimestampNTZNanos.
{code:java}
/**
* Timestamp without time zone with fractional-second precision up to
* nanoseconds (9 decimal digits)
*/
@Unstable
case class TimestampNTZNanosType(precision: Int) extends DatetimeType {
if (precision < 0 || precision > 9)
{ throw DataTypeErrors.unsupportedTimestampPrecisionError(precision) }
/**
* Default size used by Spark for row-size estimation.
* Nanosecond-capable NTZ values are represented logically as epoch
* microseconds (`Long`, 8 bytes) plus nanoseconds within that micro
* (`Short`, 2 bytes). Size estimation sums those fixed logical
* parts (10 bytes). Physical encoding in
[[org.apache.spark.sql.catalyst.expressions.UnsafeRow]]
* uses a separate layout (one fixed pointer word plus a variable-length
payload).
*/
override def defaultSize: Int = 10
override def typeName: String = s"timestamp_ntz($precision)"
override def sql: String = s"TIMESTAMP_NTZ($precision)"
private[spark] override def asNullable: TimestampNTZNanosType = this
}
{code}
h1. Appendix C. Rejected Designs
This SPIP converged on a logical pair: epoch microseconds (Long) + nanoseconds
within that micro (0..999, typically Short) with a single normalization rule.
Other encodings were considered and set aside for the reasons below.
h2. C.1 Epoch nanoseconds in a single Long
* {*}Reference{*}: related direction in the broader nanosecond-timestamp SPIP
/ design thread ([Google Doc: SPIP: Support NanoSecond Timestamp
Types|https://docs.google.com/document/d/1Q5u1whAO_KcT6d4dFFaIMy_S3RoQEo4Znwz2U-nbhls/edit?tab=t.0#heading=h.xk16mmomv6il]).
* {*}Idea{*}: store the instant as signed 64-bit nanoseconds since Unix epoch.
* *Why rejected* as the primary internal model for Spark’s timestamp execution:
** Much smaller representable calendar range in INT64 nanoseconds than INT64
microseconds (a well-known “range cliff” vs today’s microsecond timestamps).
** Higher integration cost with the existing engine, which is overwhelmingly
organized around microsecond-grained datetime rules; everything would either
convert constantly or risk drift.
** Not compatible with the SQL standard
h2. C.2 Seconds + nanos of second
* *Idea:* decompose time into whole seconds and 0..999_999_999 nanos within
the second.
* *Why rejected:*
*
** Mismatches Spark’s native microsecond “currency” (long micros) used across
most of Catalyst; forces pervasive rescaling and increases codegen/interpreted
drift risk.
** Second-bucket arithmetic is awkward for interval APIs and
microsecond-legacy behavior (many paths think in micros, not “seconds +
intra-second nanos”).
h2. C.3 Days + nanos within the day
* {*}Idea{*}: pack calendar date as day index and time-of-day as nanoseconds
within the day.
* *Why rejected:*
*
** Not aligned with how Spark’s datetime system reasons about instants (epoch
micros + zone rules, legacy rebasing, Julian/Gregorian transitions, etc.).
** Day-boundary corner cases (leap seconds notwithstandings, DST is less
relevant to NTZ/LTZ storage but the model still composes poorly with “micros
since epoch” execution).
h2. C.4 Nanos-from-epoch + Byte “high extension”
* {*}Idea{*}: keep nanosecond resolution but add an extra byte to extend
effective range beyond pure INT64 nanos limits.
* *Why rejected:*
*
** Non-standard, harder to explain, and complicates every operator (suddenly
values are a 2-part variable-precision integer in the hot path).
** Interoperability pain: external systems and file formats won’t natively
match the “byte extension” scheme; you still convert at boundaries.
was:
h1. Q1. What are you trying to do? Articulate your objectives using absolutely
no jargon.
Add nanosecond-capable timestamps with an explicit fractional precision *n* in
SQL and APIs, while keeping a simple binary value: epoch microseconds (64-bit)
+ nanoseconds within that microsecond (0–999, stored in 16 bits).
h3. SQL surface
Support TIMESTAMP_NTZ(n), TIMESTAMP_LTZ(n), and TIMESTAMP(n) in the parser
(including equivalent spellings: WITHOUT TIME ZONE, WITH LOCAL TIME ZONE, etc.)
so precision is first-class in the grammar, not a side convention. The
parameter n is optional, and its valid range is [0, 9]. It defines how many
decimal digits of the fractional second are part of the type. For example: n =
6 -> microseconds, n = 9 -> nanoseconds.
h3. Scala / Java APIs
Introduce parameterized catalyst types, e.g. TimestampNTZNanosType(n) for
TIMESTAMP_NTZ(n) and TimestampNanosType(n) for TIMESTAMP_LTZ(n).
h3. In-memory value
The internal representation of both types is long micros since the epoch +
short (or 16-bit) nanos-in-micro in [0, 999]. Both NTZ and LTZ have the same
representation; time zone only affects interpretation for LTZ, not the pair
layout.
h1. Q2. What problem is this proposal NOT designed to solve?
* *Precision below nanosecond-target range.*
This SPIP only covers high-precision parameterized timestamps where 7 <= n <=
9. Support/changes for n < 7 are explicitly out of scope for this proposal.
* *Subtraction of timestamps.* The result should be a day-time interval but
the data type has microsecond precision at the moment.
* *Changing existing time zone semantics.*
It does not redefine NTZ/LTZ semantics, session time zone behavior, or SQL time
zone rules.
* *A full rework of all connectors/storage formats*
It does not promise end-to-end nanosecond support in every external system;
connector-specific follow-ups may still be needed.
h1. Q3. How is it done today, and what are the limits of current practice?
Today Spark SQL supports two built-in timestamp types, both at microsecond
precision:
* TimestampType (TIMESTAMP WITH LOCAL TIME ZONE)
* TimestampNTZType (TIMESTAMP WITHOUT TIME ZONE)
So Spark’s native timestamp model stops at 6 fractional digits.
For nanosecond data (for example Parquet TIMESTAMP(NANOS, ...)), current
behavior is limited:
* Default behavior: Spark rejects it with an analysis error (for example:
Illegal Parquet type: INT64 (TIMESTAMP(NANOS,true))).
* Legacy fallback (spark.sql.legacy.parquet.nanosAsLong=true): Spark reads it
as LongType (raw integer), which drops timestamp semantics:
** no timestamp/date function behavior
** no time zone semantics
** no timestamp type safety
In practice this means users either:
* down-convert data to microseconds before Spark, or
* keep nanosecond values as raw integers and do manual conversion logic, or
* switch to another engine for nanosecond timestamp workloads.
This is increasingly painful because nanosecond timestamps are common in data
produced by systems like Pandas/PyArrow, Trino, ClickHouse, and DuckDB, and in
domains such as market data, IoT telemetry, and tracing/observability pipelines.
h1. Q4. What is new in your approach and why do you think it will be successful?
h2. What is new
# *Precision parameterization for timestamp types*
Add TIMESTAMP_NTZ(n), TIMESTAMP_LTZ(n), and TIMESTAMP(n) (including alias
forms) with explicit fractional precision, focused on 7 <= n <= 9.
# *New catalyst/API types for nanos-capable timestamps*
Introduce parameterized nanos-aware types instead of overloading existing
microsecond-only types.
*
** For TIMESTAMP_NTZ(n):
{code:scala}
case class TimestampNTZNanosType(precision: Int)
{code}
** For TIMESTAMP_LTZ(n):
{code:scala}
case class TimestampLTZNanosType(precision: Int)
{code}
# *Precise internal value model without changing core epoch semantics*
Represent each value as:
*
** epoch microseconds (long)
** nanoseconds within the microsecond (0..999) This preserves existing Spark
timestamp foundations while adding sub-micro precision.
# *End-to-end engine integration, not just parser support*
The change is wired through parser/type system, expression evaluation, codegen,
unsafe/container paths, and file-format handling paths needed for practical use.
h2. Why this should succeed
# Low semantic risk: extends existing timestamp families instead of redefining
them.
# Backward compatible path: existing microsecond types and behavior remain
unchanged.
# Incremental implementation: narrow scope (7..9) and clear boundaries make
rollout testable.
# Real interoperability value: directly addresses common nanos data sources
where Spark currently fails or degrades to LongType.
# Operationally practical: keeps compact representation and reuses current
execution architecture, so performance/regression risk is manageable.
h1. Q5. Who cares? If you are successful, what difference will it make?
h2. Who cares
* Data engineering teams ingesting Parquet/warehouse data produced with
nanosecond timestamps.
* Platform teams running mixed ecosystems (Spark +
Trino/ClickHouse/DuckDB/Pandas/PyArrow).
* Domain teams where sub-micro timing matters (market data, telemetry,
observability, IoT, CDC/event streams).
* Connector and table-format maintainers who currently need compatibility
workarounds.
h2. What difference success makes
# Interop works by default. Spark can read/use high-precision timestamp data
as timestamp types, instead of failing or forcing LongType fallback.
# No more semantic loss workarounds. Users avoid manual bigint-to-timestamp
conversions, custom UDF glue, and loss of timezone/type semantics.
# Correctness for high-frequency data. Distinct events within the same
microsecond stay distinct; ordering and time-window logic become more reliable
for nanos data.
# Lower migration friction to Spark. Teams can bring existing nanos datasets
into Spark pipelines without pre-normalization to micros.
# Cleaner long-term type story. Spark gets an explicit precision model for
timestamps (n) rather than implicit microsecond-only behavior, making schema
contracts clearer across SQL and APIs.
h1. Q6. What are the risks?
|Risk|Mitigation|
|*User confusion* from more timestamp spellings (TIMESTAMP(n),
TIMESTAMP_NTZ(n), TIMESTAMP_LTZ(n), plus WITHOUT TIME ZONE / WITH LOCAL TIME
ZONE variants).|Keep today’s types as the default; require explicit n for the
new behavior; document a small “cheat sheet” mapping SQL spelling -> semantics
+ precision.|
|{*}Wrong results in shared datetime code paths{*}: functions/rules that
implicitly assume microsecond-only timestamps (AnyTimestampType-style plumbing,
codegen, optimizer rewrites) may mishandle nanosecond-capable
values.|Systematic audit of shared abstractions (casts, comparisons, intervals,
extract, codegen paths) plus focused regression tests on n ∈ [7,9].|
|*Performance regressions* (wider projections, more branches in
codegen/vectorized paths).|Benchmark common scans/joins/aggregations; keep fast
paths for existing microsecond timestamps.|
|*Range / overflow issues tied to widening.* Bugs where code accidentally
converts to a single epochNanos long during promotion or builtins.|Document the
representable range for the chosen internal representation; enforce bounds at
cast boundaries; tests for edge instants.|
|*Interop / external formats (Parquet/Iceberg/etc.):* external encodings may
use epoch nanoseconds in int64, while Spark uses (micros, nanosWithinMicro).
Conversion bugs are likely near boundaries and for pushdown
predicates.|Conversions must be explicitly specified (including acceptable
rounding/truncation) and covered by tests for the supported read/write paths
you ship.|
h1. Q7. How long will it take?
This estimate covers shipping of parameterized nanosecond-capable timestamps (7
<= n <= 9) with feature parity to existing TimestampType / TimestampNTZType,
including parser, core type system, Parquet nanos support, encoders/converters,
datetime utilities, cast matrix (interpreted + codegen), expression updates,
type coercion, literals, and testing.
Implementation should integrate with the Types Framework (SPARK-53504):
register the new types through the centralized TypeOps / TypeApiOps (and
storage/client Ops as applicable) instead of scattering one-off integration
across dozens of files.
h3. Engineering estimate (person-weeks)
|Area|pw|
|Core type system (parameterization, physical dispatch for p<=6 vs p>=7,
equality/hash/order + catalyst plumbing)|2|
|SQL parser / AST (TIMESTAMP…(p) + aligned spellings)|1|
|Parquet nanos read/write (incl. vectorized + non-vectorized, rebasing, legacy
migration hooks, preview flags)|4|
|Encoders / converters (framework-aligned)|1.5|
|Datetime utils (nano parsing/conversion/arithmetic helpers)|1|
|Cast matrix (interpreted + codegen, precision rules)|2|
|Expression parity work (datetime builtins impacted)|5|
|Type coercion / widening (TypeCoercion / AnsiTypeCoercion)|3|
|Literals|1|
|Testing (unit + Parquet + overflow/range + ANSI vs non-ANSI)|2|
|Types Framework wiring + tests for the new type|1.5|
|Total|25.0|
Calendar time (rule of thumb) ~25 person-weeks is about:
* ~6 months for one senior engineer (calendar), or
* ~7–9 weeks wall time for two engineers mostly dedicated,
plus ~2–4 extra weeks calendar buffer for OSS review/CI churn and rebases.
h1. Q8. What are the mid-term and final “exams” to check for success?
h3. Mid-term “exams”. Users can declare, load, and analyze nanosecond-capable
timestamps without hacks:
# *SQL types are real, not experimental stubs.*
Users can declare schemas with TIMESTAMP…(p) / TIMESTAMP_NTZ(p) /
TIMESTAMP_LTZ(p) for p ∈ [7,9] (including equivalent WITHOUT TIME ZONE / WITH
LOCAL TIME ZONE spellings), and those types round-trip through CREATE TABLE /
CAST / DESCRIBE / explain plans in a predictable way.
# *Nanosecond timestamps stop degrading into “just BIGINT”.*
Users can read common nanosecond Parquet inputs as timestamps, without being
forced into LongType / legacy escape hatches for the supported paths you ship
in this milestone.
# *Everyday analytics works on the new types.*
For the shipped surface area, users can run typical workflows end-to-end:
filters/joins/group-by keys, casts, timestamp arithmetic,
extract/truncate-style operations.
# *Defaults unchanged for existing users.*
Existing microsecond-first workloads behave as today unless users explicitly
opt into p ∈ [7,9] or ingest nanosecond-native sources that trigger the new
behavior.
h3. Final “exams”. Workflows are consistently usable, documented, and
migration-ready at the level users expect from Spark timestamps today:
# *Parity exam vs existing Spark timestamps (user-visible).*
For TimestampType / TimestampNTZType, users already expect a broad set of
behaviors. For p ∈ [7,9], the shipped release must meet the same practical
usability standard on the supported operations list (no “half the functions
silently downgrade precision or fail inconsistently”).
# *Interop exam.*
Users can run realistic pipelines: read nanosecond-rich Parquet, transform, and
write/publish results without mandatory pre-processing outside Spark—within the
explicitly documented guarantees (lossless round-trip everywhere is not
required, but what is guaranteed must be testably true).
# *Migration exam.*
There is a published migration path away from workflows that today rely on
spark.sql.legacy.parquet.nanosAsLong (including what changes in schema types
and what users must do).
# *Documentation exam.*
Public docs answer: what p means, 7–9 scope, range/overflow behavior, ANSI vs
non-ANSI, and known limitations—written so support/engineering can deflect
repeated confusion.
h1. Appendix A. Proposed API Changes
h2. A.1 SQL language
Extend fractional-second precision (FSP) on timestamp families
|*Surface*|*Proposed syntax*|*Role*|
|NTZ nanos-capable|TIMESTAMP_NTZ(n)|Explicit NTZ with fractional precision
{*}n{*}.|
|Alias (NTZ)|TIMESTAMP(n) WITHOUT TIME ZONE|Same NTZ meaning as
TIMESTAMP_NTZ(n) (aligned with SQL-style spelling).|
|LTZ nanos-capable|TIMESTAMP_LTZ(n)|Explicit LTZ / “instant” timeline with n.|
|Alias (LTZ)|TIMESTAMP(n) WITH LOCAL TIME ZONE|Same LTZ meaning as
TIMESTAMP_LTZ(n).|
|Session default timestamp form|TIMESTAMP(n)|Where Spark today resolves
TIMESTAMP to LTZ or NTZ per configuration, n attaches FSP to that choice.|
Backward compatibility
* Unparameterized forms keep today’s meaning:
* TIMESTAMP_NTZ -> existing TimestampNTZType (microsecond semantics).
* TIMESTAMP_LTZ / TIMESTAMP WITHOUT TIME ZONE / WITH LOCAL TIME ZONE
combinations -> existing TimestampType / TimestampNTZType behavior unchanged
when (n) is omitted.
* New behavior appears only when (n) is present (and, per SPIP scope,
especially 7 <= n <= 9), mapping to the new catalyst types below - not silent
widening of old types.
h2. A.2 Scala / Java DataType APIs
New case classes:
|Type|Purpose|Notes|
|TimestampNTZNanosType(n: Int)|SQL TIMESTAMP_NTZ(n) (nanosecond-capable
NTZ).|Present on this branch: sql / typeName like TIMESTAMP_NTZ(n; n in 0..9
today in code—SPIP may narrow product scope to 7..9 while keeping storage
general.|
|TimestampNanosType(n: Int)|SQL TIMESTAMP_LTZ(n) (nanosecond-capable
LTZ).|Proposed companion to NTZ; mirrors the same FSP parameter and
external/Java mapping pattern.|
Backward compatibility
* Existing singleton types TimestampNTZType and TimestampType remain the
defaults for unparameterized SQL and for older serialized schemas that do not
carry n.
* New types are additive; code that pattern-matches only on TimestampType /
TimestampNTZType continues to compile but must be reviewed for parity paths
when TimestampNanosType / TimestampNTZNanosType appear.
h2. A.3 Compatibility summary
|Direction|Expectation|
|Old → new|Opt-in via TIMESTAMP_* (n) / new DataTypes; no change for legacy
microsecond types unless users choose nanosecond-capable types.|
|New → old|Downgrades may truncate/round per FSP rules; must be documented and
tested (ANSI throw vs non-ANSI null where applicable).|
|Cross-version|Schemas with nanosecond-capable types require a Spark version
that understands them; older engines must reject or require migration tools—not
silently coerce.|
h1. Appendix B. Design Sketch.
A value for nanosecond-capable NTZ (and the same pair for LTZ):
* {*}epochMicros{*}: Long — signed epoch microseconds (same grid as
TimestampType / TimestampNTZType today).
* {*}nanosOfMicro{*}: Short in [0, 999] — remaining nanoseconds inside that
microsecond bucket.
{*}Invariant{*}: the pair is always normalized so *nanosOfMicro* stays in
range; excess carries into *epochMicros* with Math.addExact / floor-div where
needed.
h4. Why this split (vs a single Long epoch-nanos counter):
* {*}Range{*}: *epochMicros* as Long keeps the calendar reach in the same
ballpark as today’s microsecond timestamps. A single INT64 epoch-nanoseconds
field has a much smaller representable year range; Spark can avoid that
user-visible cliff for the micros timeline part.
* *Interop/conversion cost:* most existing Spark datetime math is already
microsecond-grained; incremental changes upgrade paths by composing micro ops +
cheap nano remainder, instead of forcing every operator to immediately
normalize to nanoseconds-since-epoch.
* *Deterministic normalization:* *nanosOfMicro* is a small bounded correction
term - easy to audit in casts/parsers vs unconstrained nano arithmetic
everywhere.
TimestampNTZNanosType (and similar TimestampLTZNanosType) is the schema-level
description of TIMESTAMP_NTZ(p). The values for this type are not “a struct of
two children in the row” at runtime; the companion internalStructType exists
for metadata only (historical/compat hooks). Execution uses the normalized
logical pair epoch micros + nanoseconds within that micro via
org.apache.spark.unsafe.types.TimestampNTZNanos.
{code:java}
/**
* Timestamp without time zone with fractional-second precision up to
* nanoseconds (9 decimal digits)
*/
@Unstable
case class TimestampNTZNanosType(precision: Int) extends DatetimeType {
if (precision < 0 || precision > 9)
{ throw DataTypeErrors.unsupportedTimestampPrecisionError(precision) }
/**
* Default size used by Spark for row-size estimation.
* Nanosecond-capable NTZ values are represented logically as epoch
* microseconds (`Long`, 8 bytes) plus nanoseconds within that micro
* (`Short`, 2 bytes). Size estimation sums those fixed logical
* parts (10 bytes). Physical encoding in
[[org.apache.spark.sql.catalyst.expressions.UnsafeRow]]
* uses a separate layout (one fixed pointer word plus a variable-length
payload).
*/
override def defaultSize: Int = 10
override def typeName: String = s"timestamp_ntz($precision)"
override def sql: String = s"TIMESTAMP_NTZ($precision)"
private[spark] override def asNullable: TimestampNTZNanosType = this
}
{code}
h1. Appendix C. Rejected Designs
This SPIP converged on a logical pair: epoch microseconds (Long) + nanoseconds
within that micro (0..999, typically Short) with a single normalization rule.
Other encodings were considered and set aside for the reasons below.
h2. C.1 Epoch nanoseconds in a single Long
* {*}Reference{*}: related direction in the broader nanosecond-timestamp SPIP
/ design thread ([Google Doc: SPIP: Support NanoSecond Timestamp
Types|https://docs.google.com/document/d/1Q5u1whAO_KcT6d4dFFaIMy_S3RoQEo4Znwz2U-nbhls/edit?tab=t.0#heading=h.xk16mmomv6il]).
* {*}Idea{*}: store the instant as signed 64-bit nanoseconds since Unix epoch.
* *Why rejected* as the primary internal model for Spark’s timestamp execution:
** Much smaller representable calendar range in INT64 nanoseconds than INT64
microseconds (a well-known “range cliff” vs today’s microsecond timestamps).
** Higher integration cost with the existing engine, which is overwhelmingly
organized around microsecond-grained datetime rules; everything would either
convert constantly or risk drift.
** Not compatible with the SQL standard
h2. C.2 Seconds + nanos of second
* *Idea:* decompose time into whole seconds and 0..999_999_999 nanos within
the second.
* *Why rejected:*
*
** Mismatches Spark’s native microsecond “currency” (long micros) used across
most of Catalyst; forces pervasive rescaling and increases codegen/interpreted
drift risk.
** Second-bucket arithmetic is awkward for interval APIs and
microsecond-legacy behavior (many paths think in micros, not “seconds +
intra-second nanos”).
h2. C.3 Days + nanos within the day
* {*}Idea{*}: pack calendar date as day index and time-of-day as nanoseconds
within the day.
* *Why rejected:*
*
** Not aligned with how Spark’s datetime system reasons about instants (epoch
micros + zone rules, legacy rebasing, Julian/Gregorian transitions, etc.).
** Day-boundary corner cases (leap seconds notwithstandings, DST is less
relevant to NTZ/LTZ storage but the model still composes poorly with “micros
since epoch” execution).
h2. C.4 Nanos-from-epoch + Byte “high extension”
* {*}Idea{*}: keep nanosecond resolution but add an extra byte to extend
effective range beyond pure INT64 nanos limits.
* *Why rejected:*
*
** Non-standard, harder to explain, and complicates every operator (suddenly
values are a 2-part variable-precision integer in the hot path).
** Interoperability pain: external systems and file formats won’t natively
match the “byte extension” scheme; you still convert at boundaries.
> SPIP: Timestamps with nanosecond precision
> ------------------------------------------
>
> Key: SPARK-56822
> URL: https://issues.apache.org/jira/browse/SPARK-56822
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 4.2.0
> Reporter: Max Gekk
> Assignee: Max Gekk
> Priority: Major
>
> h1. Q1. What are you trying to do? Articulate your objectives using
> absolutely no jargon.
> Add nanosecond-capable timestamps with an explicit fractional precision *n*
> in SQL and APIs, while keeping a simple binary value: epoch microseconds
> (64-bit) + nanoseconds within that microsecond (0–999, stored in 16 bits).
> h3. SQL surface
> Support TIMESTAMP_NTZ(n), TIMESTAMP_LTZ(n), and TIMESTAMP(n) in the parser
> (including equivalent spellings: WITHOUT TIME ZONE, WITH LOCAL TIME ZONE,
> etc.) so precision is first-class in the grammar, not a side convention. The
> parameter n is optional, and its valid range is [0, 9]. It defines how many
> decimal digits of the fractional second are part of the type. For example: n
> = 6 -> microseconds, n = 9 -> nanoseconds.
> h3. Scala / Java APIs
> Introduce parameterized catalyst types, e.g. TimestampNTZNanosType(n) for
> TIMESTAMP_NTZ(n) and TimestampNanosType(n) for TIMESTAMP_LTZ(n).
> h3. In-memory value
> The internal representation of both types is long micros since the epoch +
> short (or 16-bit) nanos-in-micro in [0, 999]. Both NTZ and LTZ have the same
> representation; time zone only affects interpretation for LTZ, not the pair
> layout.
>
> h1. Q2. What problem is this proposal NOT designed to solve?
> * *Precision below nanosecond-target range.* This SPIP only covers
> high-precision parameterized timestamps where 7 <= n <= 9. Support/changes
> for n < 7 are explicitly out of scope for this proposal.
> * *Subtraction of timestamps.* The result should be a day-time interval but
> the data type has microsecond precision at the moment.
> * *Changing existing time zone semantics.* It does not redefine NTZ/LTZ
> semantics, session time zone behavior, or SQL time zone rules.
> * *A full rework of all connectors/storage formats.* It does not promise
> end-to-end nanosecond support in every external system; connector-specific
> follow-ups may still be needed.
> h1. Q3. How is it done today, and what are the limits of current practice?
> Today Spark SQL supports two built-in timestamp types, both at microsecond
> precision:
> * TimestampType (TIMESTAMP WITH LOCAL TIME ZONE)
> * TimestampNTZType (TIMESTAMP WITHOUT TIME ZONE)
> So Spark’s native timestamp model stops at 6 fractional digits.
> For nanosecond data (for example Parquet TIMESTAMP(NANOS, ...)), current
> behavior is limited:
> * Default behavior: Spark rejects it with an analysis error (for example:
> Illegal Parquet type: INT64 (TIMESTAMP(NANOS,true))).
> * Legacy fallback (spark.sql.legacy.parquet.nanosAsLong=true): Spark reads
> it as LongType (raw integer), which drops timestamp semantics:
> ** no timestamp/date function behavior
> ** no time zone semantics
> ** no timestamp type safety
> In practice this means users either:
> * down-convert data to microseconds before Spark, or
> * keep nanosecond values as raw integers and do manual conversion logic, or
> * switch to another engine for nanosecond timestamp workloads.
> This is increasingly painful because nanosecond timestamps are common in data
> produced by systems like Pandas/PyArrow, Trino, ClickHouse, and DuckDB, and
> in domains such as market data, IoT telemetry, and tracing/observability
> pipelines.
>
> h1. Q4. What is new in your approach and why do you think it will be
> successful?
> h2. What is new
> * *Precision parameterization for timestamp types*
> Add TIMESTAMP_NTZ(n), TIMESTAMP_LTZ(n), and TIMESTAMP(n) (including alias
> forms) with explicit fractional precision, focused on 7 <= n <= 9.
> * *New catalyst/API types for nanos-capable timestamps*
> Introduce parameterized nanos-aware types instead of overloading existing
> microsecond-only types.
> *
> ** For TIMESTAMP_NTZ(n): case class TimestampNTZNanosType(precision: Int)
> *
> ** For TIMESTAMP_LTZ(n): case class TimestampLTZNanosType(precision: Int)
> * *Precise internal value model without changing core epoch semantics.*
> Represent each value as:
> ** epoch microseconds (long)
> ** nanoseconds within the microsecond (0..999) This preserves existing Spark
> timestamp foundations while adding sub-micro precision.
> * *End-to-end engine integration, not just parser support*
> The change is wired through parser/type system, expression evaluation,
> codegen, unsafe/container paths, and file-format handling paths needed for
> practical use.
> h2. Why this should succeed
> # Low semantic risk: extends existing timestamp families instead of
> redefining them.
> # Backward compatible path: existing microsecond types and behavior remain
> unchanged.
> # Incremental implementation: narrow scope (7..9) and clear boundaries make
> rollout testable.
> # Real interoperability value: directly addresses common nanos data sources
> where Spark currently fails or degrades to LongType.
> # Operationally practical: keeps compact representation and reuses current
> execution architecture, so performance/regression risk is manageable.
> h1. Q5. Who cares? If you are successful, what difference will it make?
> h2. Who cares
> * Data engineering teams ingesting Parquet/warehouse data produced with
> nanosecond timestamps.
> * Platform teams running mixed ecosystems (Spark +
> Trino/ClickHouse/DuckDB/Pandas/PyArrow).
> * Domain teams where sub-micro timing matters (market data, telemetry,
> observability, IoT, CDC/event streams).
> * Connector and table-format maintainers who currently need compatibility
> workarounds.
> h2. What difference success makes
> # Interop works by default. Spark can read/use high-precision timestamp data
> as timestamp types, instead of failing or forcing LongType fallback.
> # No more semantic loss workarounds. Users avoid manual bigint-to-timestamp
> conversions, custom UDF glue, and loss of timezone/type semantics.
> # Correctness for high-frequency data. Distinct events within the same
> microsecond stay distinct; ordering and time-window logic become more
> reliable for nanos data.
> # Lower migration friction to Spark. Teams can bring existing nanos datasets
> into Spark pipelines without pre-normalization to micros.
> # Cleaner long-term type story. Spark gets an explicit precision model for
> timestamps (n) rather than implicit microsecond-only behavior, making schema
> contracts clearer across SQL and APIs.
> h1. Q6. What are the risks?
> |Risk|Mitigation|
> |*User confusion* from more timestamp spellings (TIMESTAMP(n),
> TIMESTAMP_NTZ(n), TIMESTAMP_LTZ(n), plus WITHOUT TIME ZONE / WITH LOCAL TIME
> ZONE variants).|Keep today’s types as the default; require explicit n for the
> new behavior; document a small “cheat sheet” mapping SQL spelling ->
> semantics + precision.|
> |{*}Wrong results in shared datetime code paths{*}: functions/rules that
> implicitly assume microsecond-only timestamps (AnyTimestampType-style
> plumbing, codegen, optimizer rewrites) may mishandle nanosecond-capable
> values.|Systematic audit of shared abstractions (casts, comparisons,
> intervals, extract, codegen paths) plus focused regression tests on n ∈
> [7,9].|
> |*Performance regressions* (wider projections, more branches in
> codegen/vectorized paths).|Benchmark common scans/joins/aggregations; keep
> fast paths for existing microsecond timestamps.|
> |*Range / overflow issues tied to widening.* Bugs where code accidentally
> converts to a single epochNanos long during promotion or builtins.|Document
> the representable range for the chosen internal representation; enforce
> bounds at cast boundaries; tests for edge instants.|
> |*Interop / external formats (Parquet/Iceberg/etc.):* external encodings may
> use epoch nanoseconds in int64, while Spark uses (micros, nanosWithinMicro).
> Conversion bugs are likely near boundaries and for pushdown
> predicates.|Conversions must be explicitly specified (including acceptable
> rounding/truncation) and covered by tests for the supported read/write paths
> you ship.|
>
> h1. Q7. How long will it take?
> This estimate covers shipping of parameterized nanosecond-capable timestamps
> (7 <= n <= 9) with feature parity to existing TimestampType /
> TimestampNTZType, including parser, core type system, Parquet nanos support,
> encoders/converters, datetime utilities, cast matrix (interpreted + codegen),
> expression updates, type coercion, literals, and testing.
> Implementation should integrate with the Types Framework (SPARK-53504):
> register the new types through the centralized TypeOps / TypeApiOps (and
> storage/client Ops as applicable) instead of scattering one-off integration
> across dozens of files.
> h3. Engineering estimate (person-weeks)
> |Area|pw|
> |Core type system (parameterization, physical dispatch for p<=6 vs p>=7,
> equality/hash/order + catalyst plumbing)|2|
> |SQL parser / AST (TIMESTAMP…(p) + aligned spellings)|1|
> |Parquet nanos read/write (incl. vectorized + non-vectorized, rebasing,
> legacy migration hooks, preview flags)|4|
> |Encoders / converters (framework-aligned)|1.5|
> |Datetime utils (nano parsing/conversion/arithmetic helpers)|1|
> |Cast matrix (interpreted + codegen, precision rules)|2|
> |Expression parity work (datetime builtins impacted)|5|
> |Type coercion / widening (TypeCoercion / AnsiTypeCoercion)|3|
> |Literals|1|
> |Testing (unit + Parquet + overflow/range + ANSI vs non-ANSI)|2|
> |Types Framework wiring + tests for the new type|1.5|
> |Total|25.0|
> Calendar time (rule of thumb) ~25 person-weeks is about:
> * ~6 months for one senior engineer (calendar), or
> * ~7–9 weeks wall time for two engineers mostly dedicated,
> plus ~2–4 extra weeks calendar buffer for OSS review/CI churn and rebases.
>
> h1. Q8. What are the mid-term and final “exams” to check for success?
> h3. Mid-term “exams”. Users can declare, load, and analyze nanosecond-capable
> timestamps without hacks:
> # *SQL types are real, not experimental stubs.*
> Users can declare schemas with TIMESTAMP…(p) / TIMESTAMP_NTZ(p) /
> TIMESTAMP_LTZ(p) for p ∈ [7,9] (including equivalent WITHOUT TIME ZONE / WITH
> LOCAL TIME ZONE spellings), and those types round-trip through CREATE TABLE /
> CAST / DESCRIBE / explain plans in a predictable way.
> # *Nanosecond timestamps stop degrading into “just BIGINT”.*
> Users can read common nanosecond Parquet inputs as timestamps, without being
> forced into LongType / legacy escape hatches for the supported paths you ship
> in this milestone.
> # *Everyday analytics works on the new types.*
> For the shipped surface area, users can run typical workflows end-to-end:
> filters/joins/group-by keys, casts, timestamp arithmetic,
> extract/truncate-style operations.
> # *Defaults unchanged for existing users.*
> Existing microsecond-first workloads behave as today unless users explicitly
> opt into p ∈ [7,9] or ingest nanosecond-native sources that trigger the new
> behavior.
>
> h3. Final “exams”. Workflows are consistently usable, documented, and
> migration-ready at the level users expect from Spark timestamps today:
> # *Parity exam vs existing Spark timestamps (user-visible).*
> For TimestampType / TimestampNTZType, users already expect a broad set of
> behaviors. For p ∈ [7,9], the shipped release must meet the same practical
> usability standard on the supported operations list (no “half the functions
> silently downgrade precision or fail inconsistently”).
> # *Interop exam.*
> Users can run realistic pipelines: read nanosecond-rich Parquet, transform,
> and write/publish results without mandatory pre-processing outside
> Spark—within the explicitly documented guarantees (lossless round-trip
> everywhere is not required, but what is guaranteed must be testably true).
> # *Migration exam.*
> There is a published migration path away from workflows that today rely on
> spark.sql.legacy.parquet.nanosAsLong (including what changes in schema types
> and what users must do).
> # *Documentation exam.*
> Public docs answer: what p means, 7–9 scope, range/overflow behavior, ANSI vs
> non-ANSI, and known limitations—written so support/engineering can deflect
> repeated confusion.
> h1. Appendix A. Proposed API Changes
> h2. A.1 SQL language
> Extend fractional-second precision (FSP) on timestamp families
> |*Surface*|*Proposed syntax*|*Role*|
> |NTZ nanos-capable|TIMESTAMP_NTZ(n)|Explicit NTZ with fractional precision
> {*}n{*}.|
> |Alias (NTZ)|TIMESTAMP(n) WITHOUT TIME ZONE|Same NTZ meaning as
> TIMESTAMP_NTZ(n) (aligned with SQL-style spelling).|
> |LTZ nanos-capable|TIMESTAMP_LTZ(n)|Explicit LTZ / “instant” timeline with n.|
> |Alias (LTZ)|TIMESTAMP(n) WITH LOCAL TIME ZONE|Same LTZ meaning as
> TIMESTAMP_LTZ(n).|
> |Session default timestamp form|TIMESTAMP(n)|Where Spark today resolves
> TIMESTAMP to LTZ or NTZ per configuration, n attaches FSP to that choice.|
> Backward compatibility
> * Unparameterized forms keep today’s meaning:
> * TIMESTAMP_NTZ -> existing TimestampNTZType (microsecond semantics).
> * TIMESTAMP_LTZ / TIMESTAMP WITHOUT TIME ZONE / WITH LOCAL TIME ZONE
> combinations -> existing TimestampType / TimestampNTZType behavior unchanged
> when (n) is omitted.
> * New behavior appears only when (n) is present (and, per SPIP scope,
> especially 7 <= n <= 9), mapping to the new catalyst types below - not silent
> widening of old types.
> h2. A.2 Scala / Java DataType APIs
> New case classes:
> |Type|Purpose|Notes|
> |TimestampNTZNanosType(n: Int)|SQL TIMESTAMP_NTZ(n) (nanosecond-capable
> NTZ).|Present on this branch: sql / typeName like TIMESTAMP_NTZ(n; n in 0..9
> today in code—SPIP may narrow product scope to 7..9 while keeping storage
> general.|
> |TimestampNanosType(n: Int)|SQL TIMESTAMP_LTZ(n) (nanosecond-capable
> LTZ).|Proposed companion to NTZ; mirrors the same FSP parameter and
> external/Java mapping pattern.|
> Backward compatibility
> * Existing singleton types TimestampNTZType and TimestampType remain the
> defaults for unparameterized SQL and for older serialized schemas that do not
> carry n.
> * New types are additive; code that pattern-matches only on TimestampType /
> TimestampNTZType continues to compile but must be reviewed for parity paths
> when TimestampNanosType / TimestampNTZNanosType appear.
> h2. A.3 Compatibility summary
> |Direction|Expectation|
> |Old → new|Opt-in via TIMESTAMP_* (n) / new DataTypes; no change for legacy
> microsecond types unless users choose nanosecond-capable types.|
> |New → old|Downgrades may truncate/round per FSP rules; must be documented
> and tested (ANSI throw vs non-ANSI null where applicable).|
> |Cross-version|Schemas with nanosecond-capable types require a Spark version
> that understands them; older engines must reject or require migration
> tools—not silently coerce.|
> h1. Appendix B. Design Sketch.
> A value for nanosecond-capable NTZ (and the same pair for LTZ):
> * {*}epochMicros{*}: Long — signed epoch microseconds (same grid as
> TimestampType / TimestampNTZType today).
> * {*}nanosOfMicro{*}: Short in [0, 999] — remaining nanoseconds inside that
> microsecond bucket.
> {*}Invariant{*}: the pair is always normalized so *nanosOfMicro* stays in
> range; excess carries into *epochMicros* with Math.addExact / floor-div where
> needed.
> h4. Why this split (vs a single Long epoch-nanos counter):
> * {*}Range{*}: *epochMicros* as Long keeps the calendar reach in the same
> ballpark as today’s microsecond timestamps. A single INT64 epoch-nanoseconds
> field has a much smaller representable year range; Spark can avoid that
> user-visible cliff for the micros timeline part.
> * *Interop/conversion cost:* most existing Spark datetime math is already
> microsecond-grained; incremental changes upgrade paths by composing micro ops
> + cheap nano remainder, instead of forcing every operator to immediately
> normalize to nanoseconds-since-epoch.
> * *Deterministic normalization:* *nanosOfMicro* is a small bounded
> correction term - easy to audit in casts/parsers vs unconstrained nano
> arithmetic everywhere.
> TimestampNTZNanosType (and similar TimestampLTZNanosType) is the schema-level
> description of TIMESTAMP_NTZ(p). The values for this type are not “a struct
> of two children in the row” at runtime; the companion internalStructType
> exists for metadata only (historical/compat hooks). Execution uses the
> normalized logical pair epoch micros + nanoseconds within that micro via
> org.apache.spark.unsafe.types.TimestampNTZNanos.
> {code:java}
> /**
> * Timestamp without time zone with fractional-second precision up to
> * nanoseconds (9 decimal digits)
> */
> @Unstable
> case class TimestampNTZNanosType(precision: Int) extends DatetimeType {
> if (precision < 0 || precision > 9)
> { throw DataTypeErrors.unsupportedTimestampPrecisionError(precision) }
>
> /**
> * Default size used by Spark for row-size estimation.
> * Nanosecond-capable NTZ values are represented logically as epoch
> * microseconds (`Long`, 8 bytes) plus nanoseconds within that micro
> * (`Short`, 2 bytes). Size estimation sums those fixed logical
> * parts (10 bytes). Physical encoding in
> [[org.apache.spark.sql.catalyst.expressions.UnsafeRow]]
> * uses a separate layout (one fixed pointer word plus a variable-length
> payload).
> */
> override def defaultSize: Int = 10
>
> override def typeName: String = s"timestamp_ntz($precision)"
> override def sql: String = s"TIMESTAMP_NTZ($precision)"
> private[spark] override def asNullable: TimestampNTZNanosType = this
> }
> {code}
>
> h1. Appendix C. Rejected Designs
> This SPIP converged on a logical pair: epoch microseconds (Long) +
> nanoseconds within that micro (0..999, typically Short) with a single
> normalization rule. Other encodings were considered and set aside for the
> reasons below.
> h2. C.1 Epoch nanoseconds in a single Long
> * {*}Reference{*}: related direction in the broader nanosecond-timestamp
> SPIP / design thread ([Google Doc: SPIP: Support NanoSecond Timestamp
> Types|https://docs.google.com/document/d/1Q5u1whAO_KcT6d4dFFaIMy_S3RoQEo4Znwz2U-nbhls/edit?tab=t.0#heading=h.xk16mmomv6il]).
> * {*}Idea{*}: store the instant as signed 64-bit nanoseconds since Unix
> epoch.
> * *Why rejected* as the primary internal model for Spark’s timestamp
> execution:
> ** Much smaller representable calendar range in INT64 nanoseconds than INT64
> microseconds (a well-known “range cliff” vs today’s microsecond timestamps).
> ** Higher integration cost with the existing engine, which is overwhelmingly
> organized around microsecond-grained datetime rules; everything would either
> convert constantly or risk drift.
> ** Not compatible with the SQL standard
> h2. C.2 Seconds + nanos of second
> * *Idea:* decompose time into whole seconds and 0..999_999_999 nanos within
> the second.
> * *Why rejected:*
> *
> ** Mismatches Spark’s native microsecond “currency” (long micros) used
> across most of Catalyst; forces pervasive rescaling and increases
> codegen/interpreted drift risk.
> ** Second-bucket arithmetic is awkward for interval APIs and
> microsecond-legacy behavior (many paths think in micros, not “seconds +
> intra-second nanos”).
> h2. C.3 Days + nanos within the day
> * {*}Idea{*}: pack calendar date as day index and time-of-day as nanoseconds
> within the day.
> * *Why rejected:*
> *
> ** Not aligned with how Spark’s datetime system reasons about instants
> (epoch micros + zone rules, legacy rebasing, Julian/Gregorian transitions,
> etc.).
> ** Day-boundary corner cases (leap seconds notwithstandings, DST is less
> relevant to NTZ/LTZ storage but the model still composes poorly with “micros
> since epoch” execution).
> h2. C.4 Nanos-from-epoch + Byte “high extension”
> * {*}Idea{*}: keep nanosecond resolution but add an extra byte to extend
> effective range beyond pure INT64 nanos limits.
> * *Why rejected:*
> *
> ** Non-standard, harder to explain, and complicates every operator (suddenly
> values are a 2-part variable-precision integer in the hot path).
> ** Interoperability pain: external systems and file formats won’t natively
> match the “byte extension” scheme; you still convert at boundaries.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]