(spark) branch branch-4.x updated: [SPARK-57162][SQL] Add nanosecond-aware TimestampFormatter for parsing and formatting TimestampNanosVal

maxgekk Thu, 04 Jun 2026 00:03:14 -0700

This is an automated email from the ASF dual-hosted git repository.

MaxGekk pushed a commit to branch branch-4.x
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/branch-4.x by this push:
     new 866af0dddc1b [SPARK-57162][SQL] Add nanosecond-aware 
TimestampFormatter for parsing and formatting TimestampNanosVal
866af0dddc1b is described below

commit 866af0dddc1b81b07ecc17c279283ca8b8ac0d4c
Author: Maxim Gekk <[email protected]>
AuthorDate: Thu Jun 4 09:02:35 2026 +0200

    [SPARK-57162][SQL] Add nanosecond-aware TimestampFormatter for parsing and 
formatting TimestampNanosVal
    
    ### What changes were proposed in this pull request?
    
    Extend the `TimestampFormatter` family with additive, nanosecond-aware 
parse and format methods that produce and consume 
`org.apache.spark.unsafe.types.TimestampNanosVal` (`epochMicros: Long` + 
`nanosWithinMicro: Short` in `[0, 999]`) at a target fractional precision `p` 
in `[7, 9]`:
    
    - New trait methods: `parseNanos` / `parseNanosOptional` (LTZ), 
`parseWithoutTimeZoneNanos` / `parseWithoutTimeZoneNanosOptional` (NTZ, plus a 
`final` `allowTimeZone = true` overload), and `formatNanos`.
    - `Iso8601TimestampFormatter`: `extractNanos` / `extractNanosNTZ` build the 
`Instant` / `LocalDateTime` and delegate to 
`SparkDateTimeUtils.instantToTimestampNanos` / `localDateTimeToTimestampNanos`; 
`formatNanos` floors sub-`precision` digits and renders the reconstructed 
instant.
    - `DefaultTimestampFormatter`: delegates to the SPARK-57032 nanos entry 
points.
    - `LegacyFastTimestampFormatter` / `LegacySimpleTimestampFormatter`: 
explicitly reject nanosecond precision under the `LEGACY` time parser policy 
(they cap at micro resolution).
    
    Sub-precision fractional digits are truncated (floored), consistent with 
SPARK-57032. All existing microsecond methods are unchanged (additive API).
    
    ### Why are the changes needed?
    
    Today `TimestampFormatter` is microsecond-only and discards the 7th-9th 
fractional digits. The JSON and CSV datasources drive all timestamp 
parsing/formatting through `TimestampFormatter`, so they cannot round-trip 7-9 
digit fractions until the formatter is nanos-aware. This is the foundational 
unblocker for nanosecond support in those datasources (parent: SPARK-56822).
    
    ### Does this PR introduce _any_ user-facing change?
    
    No. The new formatter API is additive and gated for use behind 
`spark.sql.timestampNanosTypes.enabled` by its callers.
    
    ### How was this patch tested?
    
    New cases in `TimestampFormatterSuite`: parse/format round-trip for `p` in 
`[7, 9]` across ISO default and custom patterns (LTZ and NTZ); boundary values 
(`nanosWithinMicro` 0 and 999, pre-epoch instants, the 0001/1582/1970/9999 
corpus); truncation rule; NTZ time-zone rejection; and LEGACY-mode rejection.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    Generated-by: Cursor (Claude Opus 4.8)
    
    Closes #56295 from MaxGekk/spark-57162-nanos-timestamp-formatter.
    
    Authored-by: Maxim Gekk <[email protected]>
    Signed-off-by: Max Gekk <[email protected]>
    (cherry picked from commit 5ea7a6136a2cebdeac8cb98ea5b33ae7e19ec37b)
    Signed-off-by: Max Gekk <[email protected]>
---
 .../src/main/resources/error/error-conditions.json |   5 +
 .../sql/catalyst/util/SparkDateTimeUtils.scala     |  12 +-
 .../sql/catalyst/util/TimestampFormatter.scala     | 310 ++++++++++++++++++++-
 .../apache/spark/sql/errors/ExecutionErrors.scala  |   6 +
 .../sql/catalyst/util/DateTimeUtilsSuite.scala     |  20 +-
 .../catalyst/util/TimestampFormatterSuite.scala    | 223 ++++++++++++++-
 6 files changed, 567 insertions(+), 9 deletions(-)

diff --git a/common/utils/src/main/resources/error/error-conditions.json 
b/common/utils/src/main/resources/error/error-conditions.json
index 734e0335472b..a680a04b831d 100644
--- a/common/utils/src/main/resources/error/error-conditions.json
+++ b/common/utils/src/main/resources/error/error-conditions.json
@@ -8360,6 +8360,11 @@
           "Temporary views cannot be created with the WITH SCHEMA clause. 
Recreate the temporary view when the underlying schema changes, or use a 
persisted view."
         ]
       },
+      "TIMESTAMP_NANOS_WITH_LEGACY_TIME_PARSER" : {
+        "message" : [
+          "Parsing or formatting nanosecond-precision timestamps 
(TIMESTAMP_LTZ/TIMESTAMP_NTZ with precision in [7, 9]) under the LEGACY time 
parser policy. Set <config> to CORRECTED."
+        ]
+      },
       "TIME_TRAVEL" : {
         "message" : [
           "Time travel on the relation: <relationId>."
diff --git 
a/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/SparkDateTimeUtils.scala
 
b/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/SparkDateTimeUtils.scala
index 29f280fdd09c..5b08c965c055 100644
--- 
a/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/SparkDateTimeUtils.scala
+++ 
b/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/SparkDateTimeUtils.scala
@@ -215,14 +215,20 @@ trait SparkDateTimeUtils {
    * The input is the already-extracted `nanosWithinMicro` component 
(`0..999`), so truncation is
    * independent of the epoch sign of the original timestamp value.
    *
-   * Precisions outside `[7, 9]` are passed through unchanged because the 
surrounding timestamp
-   * nanos types validate the bound.
+   * `precision` is expected to originate from a validated 
`TimestampNTZNanosType` /
+   * `TimestampLTZNanosType` (which can only be constructed with `p` in [7, 
9]), so it is not a
+   * user-reachable input here. An out-of-range value therefore indicates an 
internal caller bug
+   * and raises an internal error rather than silently retaining all 
sub-microsecond digits.
    */
   private def truncateNanosWithinMicroToPrecision(nanosWithinMicro: Int, 
precision: Int): Int = {
     precision match {
       case 7 => (nanosWithinMicro / 100) * 100
       case 8 => (nanosWithinMicro / 10) * 10
-      case _ => nanosWithinMicro
+      case 9 => nanosWithinMicro
+      case _ =>
+        throw SparkException.internalError(
+          s"Fractional second precision $precision is out of range " +
+            s"[${TimestampNTZNanosType.MIN_PRECISION}, 
${TimestampNTZNanosType.MAX_PRECISION}].")
     }
   }
 
diff --git 
a/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala
 
b/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala
index f09df4fcbee9..a340e9a3b9b2 100644
--- 
a/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala
+++ 
b/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala
@@ -29,7 +29,7 @@ import scala.util.control.NonFatal
 
 import org.apache.commons.lang3.time.FastDateFormat
 
-import org.apache.spark.{SparkException, SparkIllegalArgumentException}
+import org.apache.spark.{SparkException, SparkIllegalArgumentException, 
SparkUnsupportedOperationException}
 import org.apache.spark.sql.catalyst.util.DateTimeConstants._
 import org.apache.spark.sql.catalyst.util.LegacyDateFormats.{LegacyDateFormat, 
LENIENT_SIMPLE_DATE_FORMAT}
 import org.apache.spark.sql.catalyst.util.RebaseDateTime._
@@ -38,7 +38,7 @@ import org.apache.spark.sql.errors.ExecutionErrors
 import org.apache.spark.sql.internal.LegacyBehaviorPolicy._
 import org.apache.spark.sql.internal.SqlApiConf
 import org.apache.spark.sql.types.{Decimal, TimestampNTZType}
-import org.apache.spark.unsafe.types.UTF8String
+import org.apache.spark.unsafe.types.{TimestampNanosVal, UTF8String}
 
 sealed trait TimestampFormatter extends Serializable {
 
@@ -158,6 +158,107 @@ sealed trait TimestampFormatter extends Serializable {
     // did not fail if timestamp contained zone-id or zone-offset component 
and instead ignored it.
     parseWithoutTimeZone(s, true)
 
+  /**
+   * Parses a timestamp in a string and converts it to a [[TimestampNanosVal]] 
(epoch microseconds
+   * plus a sub-microsecond remainder in `[0, 999]`) for 
`TIMESTAMP_LTZ(precision)`. Fractional
+   * digits beyond `precision` are truncated (floored), matching the 
cast/parse rule used by the
+   * microsecond path and `SparkDateTimeUtils`.
+   *
+   * @param s
+   *   \- string with timestamp to parse
+   * @param precision
+   *   \- the target fractional-second precision in `[7, 9]`
+   * @return
+   *   the parsed value as a [[TimestampNanosVal]].
+   */
+  @throws(classOf[ParseException])
+  @throws(classOf[DateTimeParseException])
+  @throws(classOf[DateTimeException])
+  def parseNanos(s: String, precision: Int): TimestampNanosVal
+
+  /**
+   * Optional counterpart of [[parseNanos]]. The result is `None` on invalid 
input.
+   *
+   * Intentionally abstract (unlike the microsecond [[parseOptional]]): a 
swallowing `try`/`catch`
+   * default would also mask the user-facing 
`TIMESTAMP_NANOS_WITH_LEGACY_TIME_PARSER` error that
+   * the legacy formatters raise from [[parseNanos]], silently returning 
`None`. Each formatter
+   * must decide explicitly.
+   */
+  def parseNanosOptional(s: String, precision: Int): Option[TimestampNanosVal]
+
+  /**
+   * Parses a timestamp in a string and converts it to a [[TimestampNanosVal]] 
for
+   * `TIMESTAMP_NTZ(precision)`. The result is independent of time zones; a 
time zone component is
+   * discarded when `allowTimeZone` is `true` and rejected otherwise. 
Fractional digits beyond
+   * `precision` are truncated (floored).
+   *
+   * @param s
+   *   \- string with timestamp to parse
+   * @param precision
+   *   \- the target fractional-second precision in `[7, 9]`
+   * @param allowTimeZone
+   *   \- indicates strict parsing of timezone
+   * @throws IllegalStateException
+   *   The formatter for timestamp without time zone should always implement 
this method. The
+   *   exception should never be hit.
+   */
+  @throws(classOf[ParseException])
+  @throws(classOf[DateTimeParseException])
+  @throws(classOf[DateTimeException])
+  @throws(classOf[IllegalStateException])
+  def parseWithoutTimeZoneNanos(
+      s: String,
+      precision: Int,
+      allowTimeZone: Boolean): TimestampNanosVal =
+    throw SparkException.internalError(
+      s"The method `parseWithoutTimeZoneNanos(s: String, precision: Int, 
allowTimeZone: " +
+        "Boolean)` should be implemented in the formatter of timestamp without 
time zone")
+
+  /**
+   * Optional counterpart of [[parseWithoutTimeZoneNanos]]. The result is 
`None` on invalid input.
+   *
+   * Intentionally abstract for the same reason as [[parseNanosOptional]]: a 
swallowing default
+   * would mask the legacy formatters' 
`TIMESTAMP_NANOS_WITH_LEGACY_TIME_PARSER` error.
+   */
+  @throws(classOf[ParseException])
+  @throws(classOf[DateTimeParseException])
+  @throws(classOf[DateTimeException])
+  @throws(classOf[IllegalStateException])
+  def parseWithoutTimeZoneNanosOptional(
+      s: String,
+      precision: Int,
+      allowTimeZone: Boolean): Option[TimestampNanosVal]
+
+  /**
+   * Parses a timestamp in a string to a [[TimestampNanosVal]] for 
`TIMESTAMP_NTZ(precision)`.
+   * Zone-id and zone-offset components are ignored.
+   */
+  @throws(classOf[ParseException])
+  @throws(classOf[DateTimeParseException])
+  @throws(classOf[DateTimeException])
+  @throws(classOf[IllegalStateException])
+  final def parseWithoutTimeZoneNanos(s: String, precision: Int): 
TimestampNanosVal =
+    parseWithoutTimeZoneNanos(s, precision, true)
+
+  /**
+   * Formats a [[TimestampNanosVal]] to a string at the target 
fractional-second `precision` in
+   * `[7, 9]` for `TIMESTAMP_LTZ(precision)`. The value is rendered in the 
formatter's `zoneId`
+   * (it goes through the `format(instant: Instant)` path), so it must not be 
used for NTZ values;
+   * use [[formatWithoutTimeZoneNanos]] for those. Sub-`precision` digits are 
truncated (floored)
+   * before rendering; the number of fractional digits actually emitted 
follows the formatter
+   * pattern (e.g. the count of `S` letters), consistent with the microsecond 
`format` overloads.
+   */
+  def formatNanos(v: TimestampNanosVal, precision: Int): String
+
+  /**
+   * NTZ counterpart of [[formatNanos]]: formats a [[TimestampNanosVal]] for
+   * `TIMESTAMP_NTZ(precision)` independently of any time zone. The value is 
rendered as its
+   * UTC-grid wall-clock local date-time, mirroring the microsecond 
`format(localDateTime:
+   * LocalDateTime)` path; unlike [[formatNanos]] it does not apply the 
formatter's `zoneId`.
+   * Sub-`precision` digits are truncated (floored) before rendering.
+   */
+  def formatWithoutTimeZoneNanos(v: TimestampNanosVal, precision: Int): String
+
   def format(us: Long): String
   def format(ts: Timestamp): String
   def format(instant: Instant): String
@@ -227,6 +328,42 @@ class Iso8601TimestampFormatter(
     } catch checkParsedDiff(s, legacyFormatter.parse)
   }
 
+  // `checkParsedDiff` only uses the legacy parse to decide whether to raise 
an upgrade exception
+  // and never returns its result, so the legacy formatter (microsecond-only) 
is fine here even on
+  // the nanos path. The returned `TimestampNanosVal.ZERO` is discarded.
+  protected def legacyNanosParse(str: String): TimestampNanosVal = {
+    legacyFormatter.parse(str)
+    TimestampNanosVal.ZERO
+  }
+
+  override def parseNanosOptional(s: String, precision: Int): 
Option[TimestampNanosVal] = {
+    try {
+      val parsePosition = new ParsePosition(0)
+      val parsed = formatter.parseUnresolved(s, parsePosition)
+      if (parsed != null && s.length == parsePosition.getIndex) {
+        Some(extractNanos(parsed, precision))
+      } else {
+        None
+      }
+    } catch {
+      case NonFatal(_) => None
+    }
+  }
+
+  private def extractNanos(parsed: TemporalAccessor, precision: Int): 
TimestampNanosVal = {
+    val parsedZoneId = parsed.query(TemporalQueries.zone())
+    val timeZoneId = if (parsedZoneId == null) zoneId else parsedZoneId
+    val zonedDateTime = toZonedDateTime(parsed, timeZoneId)
+    SparkDateTimeUtils.instantToTimestampNanos(zonedDateTime.toInstant, 
precision)
+  }
+
+  override def parseNanos(s: String, precision: Int): TimestampNanosVal = {
+    try {
+      val parsed = formatter.parse(s)
+      extractNanos(parsed, precision)
+    } catch checkParsedDiff(s, legacyNanosParse)
+  }
+
   override def parseWithoutTimeZoneOptional(s: String, allowTimeZone: 
Boolean): Option[Long] = {
     try {
       val parsePosition = new ParsePosition(0)
@@ -260,6 +397,48 @@ class Iso8601TimestampFormatter(
     } catch checkParsedDiff(s, legacyFormatter.parse)
   }
 
+  override def parseWithoutTimeZoneNanosOptional(
+      s: String,
+      precision: Int,
+      allowTimeZone: Boolean): Option[TimestampNanosVal] = {
+    try {
+      val parsePosition = new ParsePosition(0)
+      val parsed = formatter.parseUnresolved(s, parsePosition)
+      if (parsed != null && s.length == parsePosition.getIndex) {
+        Some(extractNanosNTZ(s, parsed, precision, allowTimeZone))
+      } else {
+        None
+      }
+    } catch {
+      case NonFatal(_) => None
+    }
+  }
+
+  private def extractNanosNTZ(
+      s: String,
+      parsed: TemporalAccessor,
+      precision: Int,
+      allowTimeZone: Boolean): TimestampNanosVal = {
+    if (!allowTimeZone && parsed.query(TemporalQueries.zone()) != null) {
+      throw ExecutionErrors.cannotParseStringAsDataTypeError(pattern, s, 
TimestampNTZType)
+    }
+    val localDate = toLocalDate(parsed)
+    val localTime = toLocalTime(parsed)
+    SparkDateTimeUtils.localDateTimeToTimestampNanos(
+      LocalDateTime.of(localDate, localTime),
+      precision)
+  }
+
+  override def parseWithoutTimeZoneNanos(
+      s: String,
+      precision: Int,
+      allowTimeZone: Boolean): TimestampNanosVal = {
+    try {
+      val parsed = formatter.parse(s)
+      extractNanosNTZ(s, parsed, precision, allowTimeZone)
+    } catch checkParsedDiff(s, legacyNanosParse)
+  }
+
   override def format(instant: Instant): String = {
     try {
       zonedFormatter.format(instant)
@@ -280,6 +459,27 @@ class Iso8601TimestampFormatter(
     localDateTime.format(formatter)
   }
 
+  override def formatNanos(v: TimestampNanosVal, precision: Int): String = {
+    // Floor sub-`precision` digits using the shared `SparkDateTimeUtils` 
truncation rule, then
+    // render the reconstructed instant. The number of fractional digits 
emitted follows the
+    // formatter pattern (count of `S` letters), consistent with the 
microsecond `format` paths.
+    val truncated = SparkDateTimeUtils.instantToTimestampNanos(
+      SparkDateTimeUtils.timestampNanosToInstant(v),
+      precision)
+    format(SparkDateTimeUtils.timestampNanosToInstant(truncated))
+  }
+
+  override def formatWithoutTimeZoneNanos(v: TimestampNanosVal, precision: 
Int): String = {
+    // Floor sub-`precision` digits, then render the reconstructed local 
date-time via the
+    // pattern only (no `zoneId`), mirroring `format(localDateTime: 
LocalDateTime)` on the
+    // microsecond path. Routing an NTZ value through `formatNanos` / 
`format(Instant)` would
+    // apply the formatter's `zoneId` and shift the UTC-grid wall clock.
+    val truncated = SparkDateTimeUtils.localDateTimeToTimestampNanos(
+      SparkDateTimeUtils.timestampNanosToLocalDateTime(v),
+      precision)
+    format(SparkDateTimeUtils.timestampNanosToLocalDateTime(truncated))
+  }
+
   override def validatePatternString(checkLegacy: Boolean): Unit = {
     if (checkLegacy) {
       try {
@@ -346,6 +546,44 @@ class DefaultTimestampFormatter(
     val utf8Value = UTF8String.fromString(s)
     SparkDateTimeUtils.stringToTimestampWithoutTimeZone(utf8Value, 
allowTimeZone)
   }
+
+  override def parseNanos(s: String, precision: Int): TimestampNanosVal = {
+    try {
+      SparkDateTimeUtils.stringToTimestampLTZNanosAnsi(
+        UTF8String.fromString(s),
+        precision,
+        zoneId)
+    } catch checkParsedDiff(s, legacyNanosParse)
+  }
+
+  override def parseNanosOptional(s: String, precision: Int): 
Option[TimestampNanosVal] =
+    SparkDateTimeUtils.stringToTimestampLTZNanos(UTF8String.fromString(s), 
precision, zoneId)
+
+  override def parseWithoutTimeZoneNanos(
+      s: String,
+      precision: Int,
+      allowTimeZone: Boolean): TimestampNanosVal = {
+    try {
+      val utf8Value = UTF8String.fromString(s)
+      SparkDateTimeUtils
+        .stringToTimestampNTZNanos(utf8Value, precision, allowTimeZone)
+        .getOrElse {
+          throw ExecutionErrors.cannotParseStringAsDataTypeError(
+            TimestampFormatter.defaultPattern(),
+            s,
+            TimestampNTZType)
+        }
+    } catch checkParsedDiff(s, legacyNanosParse)
+  }
+
+  override def parseWithoutTimeZoneNanosOptional(
+      s: String,
+      precision: Int,
+      allowTimeZone: Boolean): Option[TimestampNanosVal] =
+    SparkDateTimeUtils.stringToTimestampNTZNanos(
+      UTF8String.fromString(s),
+      precision,
+      allowTimeZone)
 }
 
 /**
@@ -491,6 +729,35 @@ class LegacyFastTimestampFormatter(pattern: String, 
zoneId: ZoneId, locale: Loca
     format(instantToMicros(instant))
   }
 
+  override def parseNanos(s: String, precision: Int): TimestampNanosVal =
+    throw TimestampFormatter.legacyNanosUnsupported()
+
+  // The `*Optional` nanos methods are abstract in the trait (no swallowing 
default), so the legacy
+  // formatters must implement them. They throw rather than return `None` so 
the unsupported-feature
+  // error is surfaced instead of being silently masked under the LEGACY time 
parser policy.
+  override def parseNanosOptional(s: String, precision: Int): 
Option[TimestampNanosVal] =
+    throw TimestampFormatter.legacyNanosUnsupported()
+
+  // Without this override the trait default throws 
SparkException.internalError instead of the
+  // user-facing legacyNanosUnsupported error.
+  override def parseWithoutTimeZoneNanos(
+      s: String,
+      precision: Int,
+      allowTimeZone: Boolean): TimestampNanosVal =
+    throw TimestampFormatter.legacyNanosUnsupported()
+
+  override def parseWithoutTimeZoneNanosOptional(
+      s: String,
+      precision: Int,
+      allowTimeZone: Boolean): Option[TimestampNanosVal] =
+    throw TimestampFormatter.legacyNanosUnsupported()
+
+  override def formatNanos(v: TimestampNanosVal, precision: Int): String =
+    throw TimestampFormatter.legacyNanosUnsupported()
+
+  override def formatWithoutTimeZoneNanos(v: TimestampNanosVal, precision: 
Int): String =
+    throw TimestampFormatter.legacyNanosUnsupported()
+
   override def validatePatternString(checkLegacy: Boolean): Unit = 
fastDateFormat
 }
 
@@ -532,6 +799,35 @@ class LegacySimpleTimestampFormatter(
     format(instantToMicros(instant))
   }
 
+  override def parseNanos(s: String, precision: Int): TimestampNanosVal =
+    throw TimestampFormatter.legacyNanosUnsupported()
+
+  // The `*Optional` nanos methods are abstract in the trait (no swallowing 
default), so the legacy
+  // formatters must implement them. They throw rather than return `None` so 
the unsupported-feature
+  // error is surfaced instead of being silently masked under the LEGACY time 
parser policy.
+  override def parseNanosOptional(s: String, precision: Int): 
Option[TimestampNanosVal] =
+    throw TimestampFormatter.legacyNanosUnsupported()
+
+  // Without this override the trait default throws 
SparkException.internalError instead of the
+  // user-facing legacyNanosUnsupported error.
+  override def parseWithoutTimeZoneNanos(
+      s: String,
+      precision: Int,
+      allowTimeZone: Boolean): TimestampNanosVal =
+    throw TimestampFormatter.legacyNanosUnsupported()
+
+  override def parseWithoutTimeZoneNanosOptional(
+      s: String,
+      precision: Int,
+      allowTimeZone: Boolean): Option[TimestampNanosVal] =
+    throw TimestampFormatter.legacyNanosUnsupported()
+
+  override def formatNanos(v: TimestampNanosVal, precision: Int): String =
+    throw TimestampFormatter.legacyNanosUnsupported()
+
+  override def formatWithoutTimeZoneNanos(v: TimestampNanosVal, precision: 
Int): String =
+    throw TimestampFormatter.legacyNanosUnsupported()
+
   override def validatePatternString(checkLegacy: Boolean): Unit = sdf
 }
 
@@ -548,6 +844,16 @@ object TimestampFormatter {
   def defaultPattern(): String =
     s"${DateFormatter.defaultPattern} ${TimeFormatter.defaultPattern}"
 
+  /**
+   * The legacy formatters (`FastDateFormat` / `SimpleDateFormat`) cap at 
millisecond/microsecond
+   * resolution and cannot represent the sub-microsecond remainder of a 
[[TimestampNanosVal]].
+   * Nanosecond-capable timestamp types are therefore unsupported under the 
`LEGACY` time parser
+   * policy. This is a user-facing error (not an internal error) because the 
`LEGACY` policy is
+   * user-configurable and a caller may legitimately combine it with 
nanosecond timestamps.
+   */
+  def legacyNanosUnsupported(): SparkUnsupportedOperationException =
+    ExecutionErrors.nanosTimestampUnsupportedWithLegacyParserError()
+
   private def getFormatter(
       format: Option[String],
       zoneId: ZoneId,
diff --git 
a/sql/api/src/main/scala/org/apache/spark/sql/errors/ExecutionErrors.scala 
b/sql/api/src/main/scala/org/apache/spark/sql/errors/ExecutionErrors.scala
index e6e3fd847298..58a82c7270a7 100644
--- a/sql/api/src/main/scala/org/apache/spark/sql/errors/ExecutionErrors.scala
+++ b/sql/api/src/main/scala/org/apache/spark/sql/errors/ExecutionErrors.scala
@@ -53,6 +53,12 @@ private[sql] trait ExecutionErrors extends 
DataTypeErrorsBase {
       e)
   }
 
+  def nanosTimestampUnsupportedWithLegacyParserError(): 
SparkUnsupportedOperationException = {
+    new SparkUnsupportedOperationException(
+      errorClass = 
"UNSUPPORTED_FEATURE.TIMESTAMP_NANOS_WITH_LEGACY_TIME_PARSER",
+      messageParameters = Map("config" -> 
toSQLConf(SqlApiConf.LEGACY_TIME_PARSER_POLICY_KEY)))
+  }
+
   def stateStoreHandleNotInitialized(): SparkRuntimeException = {
     new SparkRuntimeException(
       errorClass = "STATE_STORE_HANDLE_NOT_INITIALIZED",
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala
index 47eb4a1e3e3c..2d9793e687a7 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala
@@ -26,7 +26,7 @@ import java.util.concurrent.TimeUnit
 import org.scalatest.matchers.must.Matchers
 import org.scalatest.matchers.should.Matchers._
 
-import org.apache.spark.{SparkArithmeticException, SparkDateTimeException, 
SparkFunSuite, SparkIllegalArgumentException}
+import org.apache.spark.{SparkArithmeticException, SparkDateTimeException, 
SparkException, SparkFunSuite, SparkIllegalArgumentException}
 import org.apache.spark.sql.catalyst.plans.SQLHelper
 import org.apache.spark.sql.catalyst.util.DateTimeConstants._
 import org.apache.spark.sql.catalyst.util.DateTimeTestUtils._
@@ -1956,6 +1956,24 @@ class DateTimeUtilsSuite extends SparkFunSuite with 
Matchers with SQLHelper {
     }
   }
 
+  test("SPARK-57162: nanos converters raise an internal error for precision 
outside [7, 9]") {
+    // `precision` is always sourced from a validated 
TimestampNTZNanosType/TimestampLTZNanosType
+    // (constructible only with p in [7, 9]), so an out-of-range value is an 
internal caller bug,
+    // not user input. Both the NTZ (LocalDateTime) and LTZ (Instant) 
converters must reject it.
+    val ldt = LocalDateTime.parse("2019-02-26T16:56:00.123456789")
+    val instant = Instant.parse("2019-02-26T16:56:00.123456789Z")
+    Seq(6, 10).foreach { p =>
+      checkError(
+        exception = 
intercept[SparkException](localDateTimeToTimestampNanos(ldt, p)),
+        condition = "INTERNAL_ERROR",
+        parameters = Map("message" -> s"Fractional second precision $p is out 
of range [7, 9]."))
+      checkError(
+        exception = intercept[SparkException](instantToTimestampNanos(instant, 
p)),
+        condition = "INTERNAL_ERROR",
+        parameters = Map("message" -> s"Fractional second precision $p is out 
of range [7, 9]."))
+    }
+  }
+
   test("SPARK-57033: random roundtrip across precisions floors to the 
precision step") {
     val rnd = new scala.util.Random(0)
     val min = Instant.parse("0001-01-01T00:00:00Z").getEpochSecond
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/TimestampFormatterSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/TimestampFormatterSuite.scala
index 558d7eda78b4..105fee7193f2 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/TimestampFormatterSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/TimestampFormatterSuite.scala
@@ -17,15 +17,16 @@
 
 package org.apache.spark.sql.catalyst.util
 
-import java.time.{DateTimeException, LocalDateTime, ZoneId}
+import java.time.{DateTimeException, Instant, LocalDateTime, ZoneId}
 import java.util.Locale
 
-import org.apache.spark.{SparkException, SparkUpgradeException}
+import org.apache.spark.{SparkException, SparkUnsupportedOperationException, 
SparkUpgradeException}
 import org.apache.spark.sql.catalyst.util.DateTimeTestUtils._
 import org.apache.spark.sql.catalyst.util.DateTimeUtils._
 import 
org.apache.spark.sql.catalyst.util.LegacyDateFormats.LENIENT_SIMPLE_DATE_FORMAT
+import org.apache.spark.sql.catalyst.util.TimestampNanosTestUtils._
 import org.apache.spark.sql.internal.{LegacyBehaviorPolicy, SQLConf}
-import org.apache.spark.unsafe.types.UTF8String
+import org.apache.spark.unsafe.types.{TimestampNanosVal, UTF8String}
 
 class TimestampFormatterSuite extends DatetimeFormatterSuite {
 
@@ -559,4 +560,220 @@ class TimestampFormatterSuite extends 
DatetimeFormatterSuite {
           "'yyyy-MM-dd HH:mm:ss' as the target spark data type 
\"TIMESTAMP_NTZ\"."))
     )
   }
+
+  // The expected LTZ value: floor the sub-`precision` fractional digits, then 
split into
+  // (epochMicros, nanosWithinMicro). Mirrors 
`SparkDateTimeUtils.instantToTimestampNanos`.
+  private def expectedLTZNanos(instant: Instant, precision: Int): 
TimestampNanosVal = {
+    val truncatedNano = nanoOfSecTruncator(precision)(instant.getNano)
+    instantToNanosVal(Instant.ofEpochSecond(instant.getEpochSecond, 
truncatedNano.toLong))
+  }
+
+  // The expected NTZ value (interpreted at UTC), with sub-`precision` digits 
floored.
+  private def expectedNTZNanos(ldt: LocalDateTime, precision: Int): 
TimestampNanosVal = {
+    
localDateTimeToNanosVal(ldt.withNano(nanoOfSecTruncator(precision)(ldt.getNano)))
+  }
+
+  private val nanosPattern = "yyyy-MM-dd'T'HH:mm:ss.SSSSSSSSS"
+
+  test("SPARK-57162: Iso8601 formatter parses strings into TimestampNanosVal 
(LTZ)") {
+    outstandingZoneIds.foreach { zoneId =>
+      val formatter = TimestampFormatter(nanosPattern, zoneId, isParsing = 
true)
+      foreachNanosPrecision { precision =>
+        specialNanosTs.foreach { ts =>
+          val input = ts.replace(' ', 'T')
+          val expected = expectedLTZNanos(parseSpecialNanosLTZ(ts, zoneId), 
precision)
+          assert(formatter.parseNanos(input, precision) === expected)
+          assert(formatter.parseNanosOptional(input, 
precision).contains(expected))
+        }
+      }
+    }
+  }
+
+  test("SPARK-57162: Iso8601 formatter parses strings into TimestampNanosVal 
(NTZ)") {
+    // NTZ values are zone-independent (interpreted at UTC), so a single 
formatter zone suffices.
+    val formatter = TimestampFormatter(nanosPattern, UTC, isParsing = true)
+    foreachNanosPrecision { precision =>
+      specialNanosTs.foreach { ts =>
+        val input = ts.replace(' ', 'T')
+        val expected = expectedNTZNanos(parseSpecialNanosNTZ(ts), precision)
+        assert(formatter.parseWithoutTimeZoneNanos(input, precision) === 
expected)
+        assert(formatter.parseWithoutTimeZoneNanos(input, precision, 
allowTimeZone = true) ===
+          expected)
+        assert(formatter.parseWithoutTimeZoneNanosOptional(input, precision, 
allowTimeZone = true)
+          .contains(expected))
+      }
+    }
+  }
+
+  test("SPARK-57162: round-trip TimestampNanosVal -> string -> 
TimestampNanosVal") {
+    outstandingZoneIds.foreach { zoneId =>
+      val parser = TimestampFormatter(nanosPattern, zoneId, isParsing = true)
+      val printer = TimestampFormatter(nanosPattern, zoneId, isParsing = false)
+      foreachNanosPrecision { precision =>
+        specialNanosTs.foreach { ts =>
+          val value = expectedLTZNanos(parseSpecialNanosLTZ(ts, zoneId), 
precision)
+          val formatted = printer.formatNanos(value, precision)
+          assert(parser.parseNanos(formatted, precision) === value)
+        }
+      }
+    }
+  }
+
+  test("SPARK-57162: sub-precision fractional digits are truncated on parse") {
+    val formatter = TimestampFormatter(nanosPattern, UTC, isParsing = true)
+    val input = "1970-01-01T00:00:00.123456789"
+    Seq(
+      9 -> nanosVal(123456L, 789),
+      8 -> nanosVal(123456L, 780),
+      7 -> nanosVal(123456L, 700)).foreach { case (precision, expected) =>
+      assert(formatter.parseNanos(input, precision) === expected)
+      assert(formatter.parseWithoutTimeZoneNanos(input, precision) === 
expected)
+    }
+  }
+
+  test("SPARK-57162: formatNanos truncates to precision and renders per 
pattern width") {
+    val value = nanosVal(123456L, 789) // 1970-01-01 00:00:00.123456789 at UTC
+    val fixed = TimestampFormatter(nanosPattern, UTC, isParsing = false)
+    assert(fixed.formatNanos(value, 9) === "1970-01-01T00:00:00.123456789")
+    assert(fixed.formatNanos(value, 8) === "1970-01-01T00:00:00.123456780")
+    assert(fixed.formatNanos(value, 7) === "1970-01-01T00:00:00.123456700")
+
+    // The fraction formatter omits trailing zeros.
+    val fraction = TimestampFormatter.getFractionFormatter(UTC)
+    assert(fraction.formatNanos(value, 9) === "1970-01-01 00:00:00.123456789")
+    assert(fraction.formatNanos(value, 8) === "1970-01-01 00:00:00.12345678")
+    assert(fraction.formatNanos(value, 7) === "1970-01-01 00:00:00.1234567")
+  }
+
+  test("SPARK-57162: formatWithoutTimeZoneNanos is zone-independent (NTZ)") {
+    // Regression guard for an LTZ-only `formatNanos`: with a non-UTC 
formatter zone, the NTZ
+    // method must render the UTC-grid wall clock unchanged, whereas 
`formatNanos` (LTZ) routes
+    // through `format(Instant)` and shifts the value into the zone. All-UTC 
NTZ cases miss this.
+    val value = nanosVal(123456L, 789) // wall clock 1970-01-01 
00:00:00.123456789 on the UTC grid
+    val zone = getZoneId("+01:00")
+    val printer = TimestampFormatter(nanosPattern, zone, isParsing = false)
+    // The 9-`S` pattern always emits 9 fractional digits; truncation zeros 
the low ones.
+    Seq(
+      9 -> "1970-01-01T00:00:00.123456789",
+      8 -> "1970-01-01T00:00:00.123456780",
+      7 -> "1970-01-01T00:00:00.123456700").foreach { case (precision, 
expectedNtz) =>
+      assert(printer.formatWithoutTimeZoneNanos(value, precision) === 
expectedNtz)
+    }
+    // LTZ rendering of the same value is shifted by the +01:00 offset.
+    assert(printer.formatNanos(value, 9) === "1970-01-01T01:00:00.123456789")
+    // The NTZ output round-trips through the matching NTZ parser regardless 
of formatter zone.
+    val parser = TimestampFormatter(nanosPattern, zone, isParsing = true)
+    assert(parser.parseWithoutTimeZoneNanos(
+      printer.formatWithoutTimeZoneNanos(value, 9), 9) === value)
+  }
+
+  test("SPARK-57162: NTZ nanos parse rejects a time zone when not allowed") {
+    val formatter = TimestampFormatter(
+      "yyyy-MM-dd'T'HH:mm:ss.SSSSSSSSSXXX",
+      UTC,
+      isParsing = true)
+    val input = "2018-12-02T10:11:12.123456789+01:00"
+    // When the zone component is allowed it is discarded and the local fields 
are kept.
+    val expected = expectedNTZNanos(LocalDateTime.of(2018, 12, 2, 10, 11, 12, 
123456789), 9)
+    assert(formatter.parseWithoutTimeZoneNanos(input, 9, allowTimeZone = true) 
=== expected)
+
+    intercept[SparkException] {
+      formatter.parseWithoutTimeZoneNanos(input, 9, allowTimeZone = false)
+    }
+    assert(formatter.parseWithoutTimeZoneNanosOptional(input, 9, allowTimeZone 
= false).isEmpty)
+  }
+
+  test("SPARK-57162: DefaultTimestampFormatter parses nanos without a 
pattern") {
+    outstandingZoneIds.foreach { zoneId =>
+      val formatter = new DefaultTimestampFormatter(
+        zoneId,
+        locale = DateFormatter.defaultLocale,
+        legacyFormat = LegacyDateFormats.SIMPLE_DATE_FORMAT,
+        isParsing = true)
+      val ldt = LocalDateTime.of(2021, 8, 12, 18, 31, 50, 123456789)
+      val input = "2021-08-12T18:31:50.123456789"
+      foreachNanosPrecision { precision =>
+        val expectedLtz = expectedLTZNanos(ldt.atZone(zoneId).toInstant, 
precision)
+        assert(formatter.parseNanos(input, precision) === expectedLtz)
+        assert(formatter.parseNanosOptional(input, 
precision).contains(expectedLtz))
+        val expectedNtz = expectedNTZNanos(ldt, precision)
+        assert(formatter.parseWithoutTimeZoneNanos(input, precision) === 
expectedNtz)
+        assert(formatter.parseWithoutTimeZoneNanosOptional(input, precision, 
allowTimeZone = true)
+          .contains(expectedNtz))
+      }
+      assert(formatter.parseNanosOptional("x123", 9).isEmpty)
+      assert(formatter.parseWithoutTimeZoneNanosOptional("x123", 9, 
allowTimeZone = true).isEmpty)
+    }
+  }
+
+  test("SPARK-57162: DefaultTimestampFormatter.formatNanos uses the default 
pattern (no fracs)") {
+    // DefaultTimestampFormatter inherits 
Iso8601TimestampFormatter.formatNanos, which renders via
+    // the default pattern "yyyy-MM-dd HH:mm:ss". That pattern has no S 
fields, so sub-second
+    // digits are not emitted. This is expected behaviour: 
DefaultTimestampFormatter is
+    // parse-oriented and callers that need fractional output should use 
FractionTimestampFormatter.
+    val formatter = new DefaultTimestampFormatter(
+      UTC,
+      locale = DateFormatter.defaultLocale,
+      legacyFormat = LegacyDateFormats.SIMPLE_DATE_FORMAT,
+      isParsing = false)
+    val value = nanosVal(123456L, 789) // 1970-01-01 00:00:00.123456789 UTC
+    assert(formatter.formatNanos(value, 9) === "1970-01-01 00:00:00")
+    assert(formatter.formatNanos(value, 7) === "1970-01-01 00:00:00")
+  }
+
+  test("SPARK-57162: legacy formatters reject nanosecond precision") {
+    val fast = new LegacyFastTimestampFormatter(
+      "yyyy-MM-dd HH:mm:ss.SSSSSS",
+      zoneId = UTC,
+      locale = DateFormatter.defaultLocale)
+    val simple = new LegacySimpleTimestampFormatter(
+      "yyyy-MM-dd HH:mm:ss.SSSSSS",
+      zoneId = UTC,
+      locale = DateFormatter.defaultLocale)
+    val expectedParameters = Map(
+      "config" -> ("\"" + SQLConf.LEGACY_TIME_PARSER_POLICY.key + "\""))
+    Seq[TimestampFormatter](fast, simple).foreach { formatter =>
+      checkError(
+        exception = intercept[SparkUnsupportedOperationException] {
+          formatter.parseNanos("2020-01-01 00:00:00.123456789", 9)
+        },
+        condition = 
"UNSUPPORTED_FEATURE.TIMESTAMP_NANOS_WITH_LEGACY_TIME_PARSER",
+        parameters = expectedParameters)
+      // The optional variants must surface the unsupported-feature error too, 
not swallow it and
+      // return None. Their counterparts are abstract in the trait 
specifically to force this.
+      checkError(
+        exception = intercept[SparkUnsupportedOperationException] {
+          formatter.parseNanosOptional("2020-01-01 00:00:00.123456789", 9)
+        },
+        condition = 
"UNSUPPORTED_FEATURE.TIMESTAMP_NANOS_WITH_LEGACY_TIME_PARSER",
+        parameters = expectedParameters)
+      checkError(
+        exception = intercept[SparkUnsupportedOperationException] {
+          formatter.formatNanos(nanosVal(0L, 1), 9)
+        },
+        condition = 
"UNSUPPORTED_FEATURE.TIMESTAMP_NANOS_WITH_LEGACY_TIME_PARSER",
+        parameters = expectedParameters)
+      checkError(
+        exception = intercept[SparkUnsupportedOperationException] {
+          formatter.parseWithoutTimeZoneNanos("2020-01-01 00:00:00.123456789", 
9)
+        },
+        condition = 
"UNSUPPORTED_FEATURE.TIMESTAMP_NANOS_WITH_LEGACY_TIME_PARSER",
+        parameters = expectedParameters)
+      checkError(
+        exception = intercept[SparkUnsupportedOperationException] {
+          formatter.parseWithoutTimeZoneNanosOptional(
+            "2020-01-01 00:00:00.123456789",
+            9,
+            allowTimeZone = true)
+        },
+        condition = 
"UNSUPPORTED_FEATURE.TIMESTAMP_NANOS_WITH_LEGACY_TIME_PARSER",
+        parameters = expectedParameters)
+      checkError(
+        exception = intercept[SparkUnsupportedOperationException] {
+          formatter.formatWithoutTimeZoneNanos(nanosVal(0L, 1), 9)
+        },
+        condition = 
"UNSUPPORTED_FEATURE.TIMESTAMP_NANOS_WITH_LEGACY_TIME_PARSER",
+        parameters = expectedParameters)
+    }
+  }
 }


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(spark) branch branch-4.x updated: [SPARK-57162][SQL] Add nanosecond-aware TimestampFormatter for parsing and formatting TimestampNanosVal

Reply via email to