[DISCUSS] SPIP: Support NanoSecond Timestamp Types

Xiaoxuan Li Wed, 08 Apr 2026 18:55:11 -0700

Hi all,

I'd like to start discussion on a SPIP for adding nanosecond-precision
timestamp support to Spark SQL.

SPIP doc:
https://docs.google.com/document/d/1Q5u1whAO_KcT6d4dFFaIMy_S3RoQEo4Znwz2U-nbhls/edit?usp=sharing
JIRA: https://issues.apache.org/jira/browse/SPARK-56159

*Motivation*

Spark's timestamp types are microsecond-precision only. Nanosecond Parquet
files either throw AnalysisException or lose timestamp semantics by falling
back to LongType, losing all timestamp semantics. Iceberg V3's timestamp_ns
/ timestamptz_ns columns are similarly unsupported.

*Proposal*
We propose adding two new singleton types — TimestampNanosType and
TimestampNTZNanosType — extending DatetimeType, stored internally as a Long
representing nanoseconds since the Unix epoch. We also propose TIMESTAMP(p)
parameterized SQL syntax (p = 0–9) that maps to existing singleton types at
parse time.

- INT64 epoch nanos internal representation, matching Parquet, Arrow,
Iceberg V3, and Pandas. Trade-off: date range is ~1677–2262 (same as
Parquet spec).
- Singleton types, not parameterized case classes. TimestampType remains
an @Stable singleton — no binary compatibility break.
- Both LTZ and NTZ variants, required for correct Parquet and Iceberg V3
interoperability.
- CHAR/VARCHAR metadata pattern for TIMESTAMP(p) precision enforcement,
avoiding DecimalType-like complexity.

*Addressing Previous Feedback*
This builds on the previous nanosecond timestamp discussion thread [1].
We've gone through the feedback and concerns raised there, and this
proposal incorporates them:

1. 10-byte storage and non-standard format → We now use INT64 epoch
nanos — the de facto standard used by Parquet, Arrow, Iceberg, Avro, and
Pandas. No Arrow compatibility issue, zero conversion overhead.
2. Precision overlap and type confusion → Replaced the parameterized
TimestampNsNTZType(precision) (where precision=6 overlapped with existing
types) with fixed-precision singletons. TIMESTAMP(p) SQL syntax is
supported but resolved to singletons at parse time (p=0–6 → micros, p=7–9 →
nanos).
3. Schema inference unspecified → Fully specified.
spark.sql.parquet.inferTimestampNanos.enabled controls Parquet
inference. Avro's timestamp-nanos and Iceberg V3's timestamp_ns map
directly. Untyped formats (CSV/JSON) default to microseconds for backward
compatibility.
4. NTZ-only vs both LTZ and NTZ → Both variants are needed. Parquet
defines TIMESTAMP(NANOS, true/false) and Iceberg V3 defines both
timestamptz_ns and timestamp_ns.
5. Parameterized type complexity → Singletons, not parameterized case
classes. Precision enforcement uses the proven CHAR/VARCHAR metadata
pattern.
6. Casting rules and type coercion undefined → Complete 7×7 cast matrix
and coercion rules specified in the doc, with "precision first, then
timezone" principle and cross-engine comparison.

The SPIP doc contains full details including API changes, internal
representation, cast matrix, type coercion rules, TIMESTAMP(p) design, and
a phased milestone plan.

Looking forward to your feedback.

Thanks,
Xiaoxuan Li

[1] https://lists.apache.org/thread/4r25lhtrg9cog956m2fldodt64dgt45j

[DISCUSS] SPIP: Support NanoSecond Timestamp Types

Reply via email to