(spark) branch master updated: [SPARK-56876][SQL] Add TimestampNTZNanosType and TimestampLTZNanosType

maxgekk Thu, 21 May 2026 00:07:31 -0700

This is an automated email from the ASF dual-hosted git repository.

MaxGekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new 1e59b7b49b14 [SPARK-56876][SQL] Add TimestampNTZNanosType and 
TimestampLTZNanosType
1e59b7b49b14 is described below

commit 1e59b7b49b14f85f7409911e7b70169c1c085dda
Author: Maxim Gekk <[email protected]>
AuthorDate: Thu May 21 09:07:04 2026 +0200

    [SPARK-56876][SQL] Add TimestampNTZNanosType and TimestampLTZNanosType
    
    ### What changes were proposed in this pull request?
    
    In the PR, I propose to extend the Spark SQL type system, and add new 
classes to Scala/Java APIs:
    
    * TimestampNTZNanosType(p)represents the SQL data type TIMESTAMP\_NTZ(p)
    * TimestampLTZNanosType(p)represents TIMESTAMP\_LTZ(p)
    
    They are public API entry points only, and have no SQL/DDL/datasource 
integration in this PR.
    
    The classes align with the SQL standard’s direction for optional feature 
F555, “Enhanced seconds precision”: datetime types can carry fractional seconds 
with precision p in the SECOND field beyond the traditional six decimal places 
(microseconds). Here p is restricted to 7, 8, and 9, i.e. the 
nanosecond-capable band (up to nine fractional digits, nanoseconds in the 
second field).
    
    The logical layout documented on the classes matches this precision story: 
epoch microseconds plus nanoseconds within that microsecond, with a default 
estimated width of 10 bytes for planning (8 \+ 2).
    
    Parameterless timestamp\_ntz / timestamp\_ltz are unchanged and remain the 
existing microsecond-oriented types.
    
    ### Why are the changes needed?
    
    New timestamp types are useful for Spark SQL users because they allow:
    
    1. Represent timestamp without time zone and timestamp with local time zone 
with fractional-second precision 7–9, in line with SQL optional feature F555 
(Enhanced seconds precision).
    2. Describe schemas from other systems that already use nanosecond-capable 
timestamps, without overloading microsecond timestamp\_ntz / timestamp\_ltz 
types.
    3. Migrate SQL and metadata that distinguish NTZ and LTZ at sub-microsecond 
precision toward Spark in small, reviewable steps.
    4. Prepare later work to read and write such columns from datasources and 
JDBC, and to apply optimizations that depend on precise timestamp types.
    
    ### Does this PR introduce *any* user-facing change?
    
    Public API adds two new types in org.apache.spark.sql.types; they cannot 
yet be used in DataFrames, schemas read from datasources, or SQL DDL.
    
    ### How was this patch tested?
    
    By extending DataTypeSuite (round-trip and precision bounds for the new 
types, including invalid precisions).
    ```
    $ build/sbt "test:testOnly *DataTypeSuite"
    ```
    Plus SparkThrowableSuite / error-json validation if error-conditions.json 
is updated.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    Generated-by: Claude Opus 4.7
    
    Closes #55952 from MaxGekk/nanos-add-types.
    
    Authored-by: Maxim Gekk <[email protected]>
    Signed-off-by: Max Gekk <[email protected]>
---
 .../src/main/resources/error/error-conditions.json |  6 ++
 .../apache/spark/sql/errors/DataTypeErrors.scala   |  7 ++
 .../org/apache/spark/sql/types/DataType.scala      | 14 ++++
 .../spark/sql/types/TimestampLTZNanosType.scala    | 62 ++++++++++++++
 .../spark/sql/types/TimestampNTZNanosType.scala    | 62 ++++++++++++++
 .../org/apache/spark/sql/types/DataTypeSuite.scala | 94 ++++++++++++++++++++++
 6 files changed, 245 insertions(+)

diff --git a/common/utils/src/main/resources/error/error-conditions.json 
b/common/utils/src/main/resources/error/error-conditions.json
index 119a1d5d42b4..997c3d976b12 100644
--- a/common/utils/src/main/resources/error/error-conditions.json
+++ b/common/utils/src/main/resources/error/error-conditions.json
@@ -4786,6 +4786,12 @@
     ],
     "sqlState" : "42K0F"
   },
+  "INVALID_TIMESTAMP_PRECISION" : {
+    "message" : [
+      "The seconds precision <precision> of <type> is invalid. Expected an 
integer in [7, 9], or parameterless <type> for precision <= 6."
+    ],
+    "sqlState" : "22023"
+  },
   "INVALID_TIMEZONE" : {
     "message" : [
       "The timezone: <timeZone> is invalid. The timezone must be either a 
region-based zone ID or a zone offset. Region IDs must have the form 
'area/city', such as 'America/Los_Angeles'. Zone offsets must be in the format 
'(+|-)HH', '(+|-)HH:mm’ or '(+|-)HH:mm:ss', e.g '-08' , '+01:00' or 
'-13:33:33', and must be in the range from -18:00 to +18:00. 'Z' and 'UTC' are 
accepted as synonyms for '+00:00'."
diff --git 
a/sql/api/src/main/scala/org/apache/spark/sql/errors/DataTypeErrors.scala 
b/sql/api/src/main/scala/org/apache/spark/sql/errors/DataTypeErrors.scala
index 1e2b2e691cd3..6e8cb8077be8 100644
--- a/sql/api/src/main/scala/org/apache/spark/sql/errors/DataTypeErrors.scala
+++ b/sql/api/src/main/scala/org/apache/spark/sql/errors/DataTypeErrors.scala
@@ -275,4 +275,11 @@ private[sql] object DataTypeErrors extends 
DataTypeErrorsBase {
       messageParameters = Map("precision" -> precision.toString),
       cause = null)
   }
+
+  def invalidTimestampPrecisionError(precision: String, typeName: String): 
Throwable = {
+    new SparkException(
+      errorClass = "INVALID_TIMESTAMP_PRECISION",
+      messageParameters = Map("precision" -> precision, "type" -> typeName),
+      cause = null)
+  }
 }
diff --git a/sql/api/src/main/scala/org/apache/spark/sql/types/DataType.scala 
b/sql/api/src/main/scala/org/apache/spark/sql/types/DataType.scala
index 48a6514440dd..fbd70cf8b899 100644
--- a/sql/api/src/main/scala/org/apache/spark/sql/types/DataType.scala
+++ b/sql/api/src/main/scala/org/apache/spark/sql/types/DataType.scala
@@ -127,6 +127,8 @@ object DataType {
   private val CHAR_TYPE = """char\(\s*(\d+)\s*\)""".r
   private val VARCHAR_TYPE = """varchar\(\s*(\d+)\s*\)""".r
   private val STRING_WITH_COLLATION = """string\s+collate\s+(\w+)""".r
+  private val TIMESTAMP_LTZ_NANOS_TYPE = """timestamp_ltz\(\s*(\d+)\s*\)""".r
+  private val TIMESTAMP_NTZ_NANOS_TYPE = """timestamp_ntz\(\s*(\d+)\s*\)""".r
   private val GEOMETRY_TYPE = """geometry\(\s*([\w]+:-?[\w]+)\s*\)""".r
   private val GEOGRAPHY_TYPE_CRS = """geography\(\s*(\w+:-?\w+)\s*\)""".r
   private val GEOGRAPHY_TYPE_ALG = """geography\(\s*(\w+)\s*\)""".r
@@ -233,6 +235,18 @@ object DataType {
       case GEOGRAPHY_TYPE_CRS_ALG(crs, alg) => GeographyType(crs, alg)
       // For backwards compatibility, previously the type name of NullType is 
"null"
       case "null" => NullType
+      case TIMESTAMP_LTZ_NANOS_TYPE(precision) =>
+        try TimestampLTZNanosType(precision.toInt)
+        catch {
+          case _: NumberFormatException =>
+            throw DataTypeErrors.invalidTimestampPrecisionError(precision, 
"TIMESTAMP_LTZ")
+        }
+      case TIMESTAMP_NTZ_NANOS_TYPE(precision) =>
+        try TimestampNTZNanosType(precision.toInt)
+        catch {
+          case _: NumberFormatException =>
+            throw DataTypeErrors.invalidTimestampPrecisionError(precision, 
"TIMESTAMP_NTZ")
+        }
       case "timestamp_ltz" => TimestampType
       case other =>
         otherTypes.getOrElse(
diff --git 
a/sql/api/src/main/scala/org/apache/spark/sql/types/TimestampLTZNanosType.scala 
b/sql/api/src/main/scala/org/apache/spark/sql/types/TimestampLTZNanosType.scala
new file mode 100644
index 000000000000..7d65a492f544
--- /dev/null
+++ 
b/sql/api/src/main/scala/org/apache/spark/sql/types/TimestampLTZNanosType.scala
@@ -0,0 +1,62 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.types
+
+import org.apache.spark.annotation.Unstable
+import org.apache.spark.sql.errors.DataTypeErrors
+
+/**
+ * Timestamp with local time zone with fractional-second precision in the 
nanosecond-capable range
+ * (7 to 9 decimal digits). Represents a time instant analogous to 
`TimestampType`, but with
+ * sub-microsecond precision: valid range is [0001-01-01T00:00:00.000000000Z,
+ * 9999-12-31T23:59:59.999999999Z] in the proleptic Gregorian calendar at 
UTC+00:00. No time zone
+ * is stored; the session time zone is used when converting values to and from 
text.
+ *
+ * @param precision
+ *   Number of digits of fractional seconds for this SQL type. The valid 
values are 7, 8, and 9
+ *   where 9 means nanosecond precision.
+ *
+ * @since 4.2.0
+ */
+@Unstable
+case class TimestampLTZNanosType(precision: Int) extends DatetimeType {
+
+  if (precision < TimestampLTZNanosType.MIN_PRECISION ||
+    precision > TimestampLTZNanosType.MAX_PRECISION) {
+    throw DataTypeErrors.invalidTimestampPrecisionError(precision.toString, 
"TIMESTAMP_LTZ")
+  }
+
+  /**
+   * Default size used by Spark for row-size estimation. Values are 
represented logically as epoch
+   * microseconds (Long, 8 bytes) plus nanoseconds within that micro (Short, 2 
bytes).
+   */
+  override def defaultSize: Int = 10
+
+  override def typeName: String = s"timestamp_ltz($precision)"
+
+  private[spark] override def asNullable: TimestampLTZNanosType = this
+}
+
+object TimestampLTZNanosType {
+  val MIN_PRECISION: Int = 7
+  val MAX_PRECISION: Int = 9
+  val NANOS_PRECISION: Int = 9
+  val DEFAULT_PRECISION: Int = NANOS_PRECISION
+
+  def apply(): TimestampLTZNanosType = new 
TimestampLTZNanosType(DEFAULT_PRECISION)
+}
diff --git 
a/sql/api/src/main/scala/org/apache/spark/sql/types/TimestampNTZNanosType.scala 
b/sql/api/src/main/scala/org/apache/spark/sql/types/TimestampNTZNanosType.scala
new file mode 100644
index 000000000000..722e0f2d25ed
--- /dev/null
+++ 
b/sql/api/src/main/scala/org/apache/spark/sql/types/TimestampNTZNanosType.scala
@@ -0,0 +1,62 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.types
+
+import org.apache.spark.annotation.Unstable
+import org.apache.spark.sql.errors.DataTypeErrors
+
+/**
+ * Timestamp without time zone with fractional-second precision in the 
nanosecond-capable range (7
+ * to 9 decimal digits). Represents a local date-time analogous to 
`TimestampNTZType`, but with
+ * sub-microsecond precision: valid range is [0001-01-01T00:00:00.000000000,
+ * 9999-12-31T23:59:59.999999999] in the proleptic Gregorian calendar. The 
value is independent of
+ * any time zone. To represent an absolute point in time, use 
`TimestampLTZNanosType` instead.
+ *
+ * @param precision
+ *   Number of digits of fractional seconds for this SQL type. The valid 
values are 7, 8, and 9
+ *   where 9 means nanosecond precision.
+ *
+ * @since 4.2.0
+ */
+@Unstable
+case class TimestampNTZNanosType(precision: Int) extends DatetimeType {
+
+  if (precision < TimestampNTZNanosType.MIN_PRECISION ||
+    precision > TimestampNTZNanosType.MAX_PRECISION) {
+    throw DataTypeErrors.invalidTimestampPrecisionError(precision.toString, 
"TIMESTAMP_NTZ")
+  }
+
+  /**
+   * Default size used by Spark for row-size estimation. Values are 
represented logically as epoch
+   * microseconds (Long, 8 bytes) plus nanoseconds within that micro (Short, 2 
bytes).
+   */
+  override def defaultSize: Int = 10
+
+  override def typeName: String = s"timestamp_ntz($precision)"
+
+  private[spark] override def asNullable: TimestampNTZNanosType = this
+}
+
+object TimestampNTZNanosType {
+  val MIN_PRECISION: Int = 7
+  val MAX_PRECISION: Int = 9
+  val NANOS_PRECISION: Int = 9
+  val DEFAULT_PRECISION: Int = NANOS_PRECISION
+
+  def apply(): TimestampNTZNanosType = new 
TimestampNTZNanosType(DEFAULT_PRECISION)
+}
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/types/DataTypeSuite.scala 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/types/DataTypeSuite.scala
index ce4f5e89be2b..1a7524dbc5a7 100644
--- a/sql/catalyst/src/test/scala/org/apache/spark/sql/types/DataTypeSuite.scala
+++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/types/DataTypeSuite.scala
@@ -17,6 +17,8 @@
 
 package org.apache.spark.sql.types
 
+import java.util.Locale
+
 import com.fasterxml.jackson.core.JsonParseException
 import org.json4s.jackson.JsonMethods
 
@@ -255,6 +257,13 @@ class DataTypeSuite extends SparkFunSuite {
   checkDataTypeFromJson(TimestampNTZType)
   checkDataTypeFromDDL(TimestampNTZType)
 
+  
checkDataTypeFromJson(TimestampLTZNanosType(TimestampLTZNanosType.MIN_PRECISION))
+  checkDataTypeFromJson(TimestampLTZNanosType(8))
+  
checkDataTypeFromJson(TimestampLTZNanosType(TimestampLTZNanosType.MAX_PRECISION))
+  
checkDataTypeFromJson(TimestampNTZNanosType(TimestampNTZNanosType.MIN_PRECISION))
+  checkDataTypeFromJson(TimestampNTZNanosType(8))
+  
checkDataTypeFromJson(TimestampNTZNanosType(TimestampNTZNanosType.MAX_PRECISION))
+
   checkDataTypeFromJson(StringType)
   checkDataTypeFromDDL(StringType)
 
@@ -403,6 +412,10 @@ class DataTypeSuite extends SparkFunSuite {
   dayTimeIntervalTypes.foreach(checkDefaultSize(_, 8))
   checkDefaultSize(TimeType(TimeType.MIN_PRECISION), 8)
   checkDefaultSize(TimeType(TimeType.MAX_PRECISION), 8)
+  checkDefaultSize(TimestampLTZNanosType(TimestampLTZNanosType.MIN_PRECISION), 
10)
+  checkDefaultSize(TimestampLTZNanosType(TimestampLTZNanosType.MAX_PRECISION), 
10)
+  checkDefaultSize(TimestampNTZNanosType(TimestampNTZNanosType.MIN_PRECISION), 
10)
+  checkDefaultSize(TimestampNTZNanosType(TimestampNTZNanosType.MAX_PRECISION), 
10)
 
   def checkEqualsIgnoreCompatibleNullability(
       from: DataType,
@@ -1448,6 +1461,87 @@ class DataTypeSuite extends SparkFunSuite {
       parameters = Map("error" -> "'time'", "hint" -> ""))
   }
 
+  test("SPARK-56876: precisions of nanos-capable TIMESTAMP_LTZ and 
TIMESTAMP_NTZ types") {
+    TimestampLTZNanosType.MIN_PRECISION to TimestampLTZNanosType.MAX_PRECISION 
foreach { p =>
+      assert(TimestampLTZNanosType(p).sql === s"TIMESTAMP_LTZ($p)")
+      assert(TimestampNTZNanosType(p).sql === s"TIMESTAMP_NTZ($p)")
+    }
+
+    Seq(6, 10, Int.MinValue, Int.MaxValue).foreach { p =>
+      checkError(
+        exception = intercept[SparkException] {
+          TimestampLTZNanosType(p)
+        },
+        condition = "INVALID_TIMESTAMP_PRECISION",
+        parameters = Map("precision" -> p.toString, "type" -> "TIMESTAMP_LTZ"))
+      checkError(
+        exception = intercept[SparkException] {
+          TimestampNTZNanosType(p)
+        },
+        condition = "INVALID_TIMESTAMP_PRECISION",
+        parameters = Map("precision" -> p.toString, "type" -> "TIMESTAMP_NTZ"))
+    }
+  }
+
+  test("SPARK-56876: parse timestamp with nanosecond precision from JSON") {
+    // (json-type-name, sql-type-name-in-error, factory)
+    val variants = Seq[(String, String, Int => DataType)](
+      ("timestamp_ltz", "TIMESTAMP_LTZ", TimestampLTZNanosType(_)),
+      ("timestamp_ntz", "TIMESTAMP_NTZ", TimestampNTZNanosType(_)))
+    val overflowing = "9" * 20
+
+    variants.foreach { case (name, sqlTypeName, factory) =>
+      // Happy path across valid precisions, tolerant of surrounding 
whitespace.
+      TimestampLTZNanosType.MIN_PRECISION to 
TimestampLTZNanosType.MAX_PRECISION foreach { n =>
+        assert(DataType.fromJson(s"""\"$name($n)\"""") === factory(n))
+        assert(DataType.fromJson(s"""\"$name( $n)\"""") === factory(n))
+        assert(DataType.fromJson(s"""\"$name($n )\"""") === factory(n))
+      }
+
+      // Out-of-range precisions surface as INVALID_TIMESTAMP_PRECISION. The 
overflowing
+      // case verifies the original digit string is preserved instead of 
leaking
+      // NumberFormatException.
+      Seq("0", "6", "10", overflowing).foreach { p =>
+        checkError(
+          exception = intercept[SparkException] {
+            DataType.fromJson(s"""\"$name($p)\"""")
+          },
+          condition = "INVALID_TIMESTAMP_PRECISION",
+          parameters = Map("precision" -> p, "type" -> sqlTypeName))
+      }
+
+      // Malformed precision forms that don't match the regex fall through to
+      // INVALID_JSON_DATA_TYPE: negative, empty parens, non-numeric, and 
uppercase
+      // (JSON type-name convention is lowercase).
+      Seq(
+        s"$name(-1)",
+        s"$name()",
+        s"$name(abc)",
+        s"${name.toUpperCase(Locale.ROOT)}(7)").foreach { raw =>
+        checkError(
+          exception = intercept[SparkIllegalArgumentException] {
+            DataType.fromJson(s"""\"$raw\"""")
+          },
+          condition = "INVALID_JSON_DATA_TYPE",
+          parameters = Map("invalidType" -> raw))
+      }
+    }
+
+    // JSON round-trip for nanos timestamp types inside struct, array, and map.
+    val structWithNanos = StructType(Seq(
+      StructField("ntz", TimestampNTZNanosType(7)),
+      StructField("ltz", TimestampLTZNanosType(8))))
+    assert(DataType.fromJson(structWithNanos.json) === structWithNanos)
+    val arrayOfNanos = ArrayType(TimestampNTZNanosType(9), containsNull = 
false)
+    assert(DataType.fromJson(arrayOfNanos.json) === arrayOfNanos)
+    val mapOfNanos = MapType(StringType, TimestampNTZNanosType(7), 
valueContainsNull = true)
+    assert(DataType.fromJson(mapOfNanos.json) === mapOfNanos)
+
+    // Bare names without parens still map to the legacy single-precision 
types.
+    assert(DataType.fromJson("\"timestamp_ltz\"") === TimestampType)
+    assert(DataType.fromJson("\"timestamp_ntz\"") === TimestampNTZType)
+  }
+
   test("singleton DataType equality after deserialization") {
     // Singleton DataTypes that use `case object` pattern matching (e.g., 
`case BinaryType =>`).
     // If a non-singleton instance is created (e.g., via Kryo deserialization 
which doesn't call


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(spark) branch master updated: [SPARK-56876][SQL] Add TimestampNTZNanosType and TimestampLTZNanosType

Reply via email to