This is an automated email from the ASF dual-hosted git repository.
MaxGekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 5ea7a6136a2c [SPARK-57162][SQL] Add nanosecond-aware
TimestampFormatter for parsing and formatting TimestampNanosVal
5ea7a6136a2c is described below
commit 5ea7a6136a2cebdeac8cb98ea5b33ae7e19ec37b
Author: Maxim Gekk <[email protected]>
AuthorDate: Thu Jun 4 09:02:35 2026 +0200
[SPARK-57162][SQL] Add nanosecond-aware TimestampFormatter for parsing and
formatting TimestampNanosVal
### What changes were proposed in this pull request?
Extend the `TimestampFormatter` family with additive, nanosecond-aware
parse and format methods that produce and consume
`org.apache.spark.unsafe.types.TimestampNanosVal` (`epochMicros: Long` +
`nanosWithinMicro: Short` in `[0, 999]`) at a target fractional precision `p`
in `[7, 9]`:
- New trait methods: `parseNanos` / `parseNanosOptional` (LTZ),
`parseWithoutTimeZoneNanos` / `parseWithoutTimeZoneNanosOptional` (NTZ, plus a
`final` `allowTimeZone = true` overload), and `formatNanos`.
- `Iso8601TimestampFormatter`: `extractNanos` / `extractNanosNTZ` build the
`Instant` / `LocalDateTime` and delegate to
`SparkDateTimeUtils.instantToTimestampNanos` / `localDateTimeToTimestampNanos`;
`formatNanos` floors sub-`precision` digits and renders the reconstructed
instant.
- `DefaultTimestampFormatter`: delegates to the SPARK-57032 nanos entry
points.
- `LegacyFastTimestampFormatter` / `LegacySimpleTimestampFormatter`:
explicitly reject nanosecond precision under the `LEGACY` time parser policy
(they cap at micro resolution).
Sub-precision fractional digits are truncated (floored), consistent with
SPARK-57032. All existing microsecond methods are unchanged (additive API).
### Why are the changes needed?
Today `TimestampFormatter` is microsecond-only and discards the 7th-9th
fractional digits. The JSON and CSV datasources drive all timestamp
parsing/formatting through `TimestampFormatter`, so they cannot round-trip 7-9
digit fractions until the formatter is nanos-aware. This is the foundational
unblocker for nanosecond support in those datasources (parent: SPARK-56822).
### Does this PR introduce _any_ user-facing change?
No. The new formatter API is additive and gated for use behind
`spark.sql.timestampNanosTypes.enabled` by its callers.
### How was this patch tested?
New cases in `TimestampFormatterSuite`: parse/format round-trip for `p` in
`[7, 9]` across ISO default and custom patterns (LTZ and NTZ); boundary values
(`nanosWithinMicro` 0 and 999, pre-epoch instants, the 0001/1582/1970/9999
corpus); truncation rule; NTZ time-zone rejection; and LEGACY-mode rejection.
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Cursor (Claude Opus 4.8)
Closes #56295 from MaxGekk/spark-57162-nanos-timestamp-formatter.
Authored-by: Maxim Gekk <[email protected]>
Signed-off-by: Max Gekk <[email protected]>
---
.../src/main/resources/error/error-conditions.json | 5 +
.../sql/catalyst/util/SparkDateTimeUtils.scala | 12 +-
.../sql/catalyst/util/TimestampFormatter.scala | 310 ++++++++++++++++++++-
.../apache/spark/sql/errors/ExecutionErrors.scala | 6 +
.../sql/catalyst/util/DateTimeUtilsSuite.scala | 20 +-
.../catalyst/util/TimestampFormatterSuite.scala | 223 ++++++++++++++-
6 files changed, 567 insertions(+), 9 deletions(-)
diff --git a/common/utils/src/main/resources/error/error-conditions.json
b/common/utils/src/main/resources/error/error-conditions.json
index cefa7ae1d06f..4fcca01fba44 100644
--- a/common/utils/src/main/resources/error/error-conditions.json
+++ b/common/utils/src/main/resources/error/error-conditions.json
@@ -8365,6 +8365,11 @@
"Temporary views cannot be created with the WITH SCHEMA clause.
Recreate the temporary view when the underlying schema changes, or use a
persisted view."
]
},
+ "TIMESTAMP_NANOS_WITH_LEGACY_TIME_PARSER" : {
+ "message" : [
+ "Parsing or formatting nanosecond-precision timestamps
(TIMESTAMP_LTZ/TIMESTAMP_NTZ with precision in [7, 9]) under the LEGACY time
parser policy. Set <config> to CORRECTED."
+ ]
+ },
"TIME_TRAVEL" : {
"message" : [
"Time travel on the relation: <relationId>."
diff --git
a/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/SparkDateTimeUtils.scala
b/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/SparkDateTimeUtils.scala
index 29f280fdd09c..5b08c965c055 100644
---
a/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/SparkDateTimeUtils.scala
+++
b/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/SparkDateTimeUtils.scala
@@ -215,14 +215,20 @@ trait SparkDateTimeUtils {
* The input is the already-extracted `nanosWithinMicro` component
(`0..999`), so truncation is
* independent of the epoch sign of the original timestamp value.
*
- * Precisions outside `[7, 9]` are passed through unchanged because the
surrounding timestamp
- * nanos types validate the bound.
+ * `precision` is expected to originate from a validated
`TimestampNTZNanosType` /
+ * `TimestampLTZNanosType` (which can only be constructed with `p` in [7,
9]), so it is not a
+ * user-reachable input here. An out-of-range value therefore indicates an
internal caller bug
+ * and raises an internal error rather than silently retaining all
sub-microsecond digits.
*/
private def truncateNanosWithinMicroToPrecision(nanosWithinMicro: Int,
precision: Int): Int = {
precision match {
case 7 => (nanosWithinMicro / 100) * 100
case 8 => (nanosWithinMicro / 10) * 10
- case _ => nanosWithinMicro
+ case 9 => nanosWithinMicro
+ case _ =>
+ throw SparkException.internalError(
+ s"Fractional second precision $precision is out of range " +
+ s"[${TimestampNTZNanosType.MIN_PRECISION},
${TimestampNTZNanosType.MAX_PRECISION}].")
}
}
diff --git
a/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala
b/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala
index f09df4fcbee9..a340e9a3b9b2 100644
---
a/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala
+++
b/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala
@@ -29,7 +29,7 @@ import scala.util.control.NonFatal
import org.apache.commons.lang3.time.FastDateFormat
-import org.apache.spark.{SparkException, SparkIllegalArgumentException}
+import org.apache.spark.{SparkException, SparkIllegalArgumentException,
SparkUnsupportedOperationException}
import org.apache.spark.sql.catalyst.util.DateTimeConstants._
import org.apache.spark.sql.catalyst.util.LegacyDateFormats.{LegacyDateFormat,
LENIENT_SIMPLE_DATE_FORMAT}
import org.apache.spark.sql.catalyst.util.RebaseDateTime._
@@ -38,7 +38,7 @@ import org.apache.spark.sql.errors.ExecutionErrors
import org.apache.spark.sql.internal.LegacyBehaviorPolicy._
import org.apache.spark.sql.internal.SqlApiConf
import org.apache.spark.sql.types.{Decimal, TimestampNTZType}
-import org.apache.spark.unsafe.types.UTF8String
+import org.apache.spark.unsafe.types.{TimestampNanosVal, UTF8String}
sealed trait TimestampFormatter extends Serializable {
@@ -158,6 +158,107 @@ sealed trait TimestampFormatter extends Serializable {
// did not fail if timestamp contained zone-id or zone-offset component
and instead ignored it.
parseWithoutTimeZone(s, true)
+ /**
+ * Parses a timestamp in a string and converts it to a [[TimestampNanosVal]]
(epoch microseconds
+ * plus a sub-microsecond remainder in `[0, 999]`) for
`TIMESTAMP_LTZ(precision)`. Fractional
+ * digits beyond `precision` are truncated (floored), matching the
cast/parse rule used by the
+ * microsecond path and `SparkDateTimeUtils`.
+ *
+ * @param s
+ * \- string with timestamp to parse
+ * @param precision
+ * \- the target fractional-second precision in `[7, 9]`
+ * @return
+ * the parsed value as a [[TimestampNanosVal]].
+ */
+ @throws(classOf[ParseException])
+ @throws(classOf[DateTimeParseException])
+ @throws(classOf[DateTimeException])
+ def parseNanos(s: String, precision: Int): TimestampNanosVal
+
+ /**
+ * Optional counterpart of [[parseNanos]]. The result is `None` on invalid
input.
+ *
+ * Intentionally abstract (unlike the microsecond [[parseOptional]]): a
swallowing `try`/`catch`
+ * default would also mask the user-facing
`TIMESTAMP_NANOS_WITH_LEGACY_TIME_PARSER` error that
+ * the legacy formatters raise from [[parseNanos]], silently returning
`None`. Each formatter
+ * must decide explicitly.
+ */
+ def parseNanosOptional(s: String, precision: Int): Option[TimestampNanosVal]
+
+ /**
+ * Parses a timestamp in a string and converts it to a [[TimestampNanosVal]]
for
+ * `TIMESTAMP_NTZ(precision)`. The result is independent of time zones; a
time zone component is
+ * discarded when `allowTimeZone` is `true` and rejected otherwise.
Fractional digits beyond
+ * `precision` are truncated (floored).
+ *
+ * @param s
+ * \- string with timestamp to parse
+ * @param precision
+ * \- the target fractional-second precision in `[7, 9]`
+ * @param allowTimeZone
+ * \- indicates strict parsing of timezone
+ * @throws IllegalStateException
+ * The formatter for timestamp without time zone should always implement
this method. The
+ * exception should never be hit.
+ */
+ @throws(classOf[ParseException])
+ @throws(classOf[DateTimeParseException])
+ @throws(classOf[DateTimeException])
+ @throws(classOf[IllegalStateException])
+ def parseWithoutTimeZoneNanos(
+ s: String,
+ precision: Int,
+ allowTimeZone: Boolean): TimestampNanosVal =
+ throw SparkException.internalError(
+ s"The method `parseWithoutTimeZoneNanos(s: String, precision: Int,
allowTimeZone: " +
+ "Boolean)` should be implemented in the formatter of timestamp without
time zone")
+
+ /**
+ * Optional counterpart of [[parseWithoutTimeZoneNanos]]. The result is
`None` on invalid input.
+ *
+ * Intentionally abstract for the same reason as [[parseNanosOptional]]: a
swallowing default
+ * would mask the legacy formatters'
`TIMESTAMP_NANOS_WITH_LEGACY_TIME_PARSER` error.
+ */
+ @throws(classOf[ParseException])
+ @throws(classOf[DateTimeParseException])
+ @throws(classOf[DateTimeException])
+ @throws(classOf[IllegalStateException])
+ def parseWithoutTimeZoneNanosOptional(
+ s: String,
+ precision: Int,
+ allowTimeZone: Boolean): Option[TimestampNanosVal]
+
+ /**
+ * Parses a timestamp in a string to a [[TimestampNanosVal]] for
`TIMESTAMP_NTZ(precision)`.
+ * Zone-id and zone-offset components are ignored.
+ */
+ @throws(classOf[ParseException])
+ @throws(classOf[DateTimeParseException])
+ @throws(classOf[DateTimeException])
+ @throws(classOf[IllegalStateException])
+ final def parseWithoutTimeZoneNanos(s: String, precision: Int):
TimestampNanosVal =
+ parseWithoutTimeZoneNanos(s, precision, true)
+
+ /**
+ * Formats a [[TimestampNanosVal]] to a string at the target
fractional-second `precision` in
+ * `[7, 9]` for `TIMESTAMP_LTZ(precision)`. The value is rendered in the
formatter's `zoneId`
+ * (it goes through the `format(instant: Instant)` path), so it must not be
used for NTZ values;
+ * use [[formatWithoutTimeZoneNanos]] for those. Sub-`precision` digits are
truncated (floored)
+ * before rendering; the number of fractional digits actually emitted
follows the formatter
+ * pattern (e.g. the count of `S` letters), consistent with the microsecond
`format` overloads.
+ */
+ def formatNanos(v: TimestampNanosVal, precision: Int): String
+
+ /**
+ * NTZ counterpart of [[formatNanos]]: formats a [[TimestampNanosVal]] for
+ * `TIMESTAMP_NTZ(precision)` independently of any time zone. The value is
rendered as its
+ * UTC-grid wall-clock local date-time, mirroring the microsecond
`format(localDateTime:
+ * LocalDateTime)` path; unlike [[formatNanos]] it does not apply the
formatter's `zoneId`.
+ * Sub-`precision` digits are truncated (floored) before rendering.
+ */
+ def formatWithoutTimeZoneNanos(v: TimestampNanosVal, precision: Int): String
+
def format(us: Long): String
def format(ts: Timestamp): String
def format(instant: Instant): String
@@ -227,6 +328,42 @@ class Iso8601TimestampFormatter(
} catch checkParsedDiff(s, legacyFormatter.parse)
}
+ // `checkParsedDiff` only uses the legacy parse to decide whether to raise
an upgrade exception
+ // and never returns its result, so the legacy formatter (microsecond-only)
is fine here even on
+ // the nanos path. The returned `TimestampNanosVal.ZERO` is discarded.
+ protected def legacyNanosParse(str: String): TimestampNanosVal = {
+ legacyFormatter.parse(str)
+ TimestampNanosVal.ZERO
+ }
+
+ override def parseNanosOptional(s: String, precision: Int):
Option[TimestampNanosVal] = {
+ try {
+ val parsePosition = new ParsePosition(0)
+ val parsed = formatter.parseUnresolved(s, parsePosition)
+ if (parsed != null && s.length == parsePosition.getIndex) {
+ Some(extractNanos(parsed, precision))
+ } else {
+ None
+ }
+ } catch {
+ case NonFatal(_) => None
+ }
+ }
+
+ private def extractNanos(parsed: TemporalAccessor, precision: Int):
TimestampNanosVal = {
+ val parsedZoneId = parsed.query(TemporalQueries.zone())
+ val timeZoneId = if (parsedZoneId == null) zoneId else parsedZoneId
+ val zonedDateTime = toZonedDateTime(parsed, timeZoneId)
+ SparkDateTimeUtils.instantToTimestampNanos(zonedDateTime.toInstant,
precision)
+ }
+
+ override def parseNanos(s: String, precision: Int): TimestampNanosVal = {
+ try {
+ val parsed = formatter.parse(s)
+ extractNanos(parsed, precision)
+ } catch checkParsedDiff(s, legacyNanosParse)
+ }
+
override def parseWithoutTimeZoneOptional(s: String, allowTimeZone:
Boolean): Option[Long] = {
try {
val parsePosition = new ParsePosition(0)
@@ -260,6 +397,48 @@ class Iso8601TimestampFormatter(
} catch checkParsedDiff(s, legacyFormatter.parse)
}
+ override def parseWithoutTimeZoneNanosOptional(
+ s: String,
+ precision: Int,
+ allowTimeZone: Boolean): Option[TimestampNanosVal] = {
+ try {
+ val parsePosition = new ParsePosition(0)
+ val parsed = formatter.parseUnresolved(s, parsePosition)
+ if (parsed != null && s.length == parsePosition.getIndex) {
+ Some(extractNanosNTZ(s, parsed, precision, allowTimeZone))
+ } else {
+ None
+ }
+ } catch {
+ case NonFatal(_) => None
+ }
+ }
+
+ private def extractNanosNTZ(
+ s: String,
+ parsed: TemporalAccessor,
+ precision: Int,
+ allowTimeZone: Boolean): TimestampNanosVal = {
+ if (!allowTimeZone && parsed.query(TemporalQueries.zone()) != null) {
+ throw ExecutionErrors.cannotParseStringAsDataTypeError(pattern, s,
TimestampNTZType)
+ }
+ val localDate = toLocalDate(parsed)
+ val localTime = toLocalTime(parsed)
+ SparkDateTimeUtils.localDateTimeToTimestampNanos(
+ LocalDateTime.of(localDate, localTime),
+ precision)
+ }
+
+ override def parseWithoutTimeZoneNanos(
+ s: String,
+ precision: Int,
+ allowTimeZone: Boolean): TimestampNanosVal = {
+ try {
+ val parsed = formatter.parse(s)
+ extractNanosNTZ(s, parsed, precision, allowTimeZone)
+ } catch checkParsedDiff(s, legacyNanosParse)
+ }
+
override def format(instant: Instant): String = {
try {
zonedFormatter.format(instant)
@@ -280,6 +459,27 @@ class Iso8601TimestampFormatter(
localDateTime.format(formatter)
}
+ override def formatNanos(v: TimestampNanosVal, precision: Int): String = {
+ // Floor sub-`precision` digits using the shared `SparkDateTimeUtils`
truncation rule, then
+ // render the reconstructed instant. The number of fractional digits
emitted follows the
+ // formatter pattern (count of `S` letters), consistent with the
microsecond `format` paths.
+ val truncated = SparkDateTimeUtils.instantToTimestampNanos(
+ SparkDateTimeUtils.timestampNanosToInstant(v),
+ precision)
+ format(SparkDateTimeUtils.timestampNanosToInstant(truncated))
+ }
+
+ override def formatWithoutTimeZoneNanos(v: TimestampNanosVal, precision:
Int): String = {
+ // Floor sub-`precision` digits, then render the reconstructed local
date-time via the
+ // pattern only (no `zoneId`), mirroring `format(localDateTime:
LocalDateTime)` on the
+ // microsecond path. Routing an NTZ value through `formatNanos` /
`format(Instant)` would
+ // apply the formatter's `zoneId` and shift the UTC-grid wall clock.
+ val truncated = SparkDateTimeUtils.localDateTimeToTimestampNanos(
+ SparkDateTimeUtils.timestampNanosToLocalDateTime(v),
+ precision)
+ format(SparkDateTimeUtils.timestampNanosToLocalDateTime(truncated))
+ }
+
override def validatePatternString(checkLegacy: Boolean): Unit = {
if (checkLegacy) {
try {
@@ -346,6 +546,44 @@ class DefaultTimestampFormatter(
val utf8Value = UTF8String.fromString(s)
SparkDateTimeUtils.stringToTimestampWithoutTimeZone(utf8Value,
allowTimeZone)
}
+
+ override def parseNanos(s: String, precision: Int): TimestampNanosVal = {
+ try {
+ SparkDateTimeUtils.stringToTimestampLTZNanosAnsi(
+ UTF8String.fromString(s),
+ precision,
+ zoneId)
+ } catch checkParsedDiff(s, legacyNanosParse)
+ }
+
+ override def parseNanosOptional(s: String, precision: Int):
Option[TimestampNanosVal] =
+ SparkDateTimeUtils.stringToTimestampLTZNanos(UTF8String.fromString(s),
precision, zoneId)
+
+ override def parseWithoutTimeZoneNanos(
+ s: String,
+ precision: Int,
+ allowTimeZone: Boolean): TimestampNanosVal = {
+ try {
+ val utf8Value = UTF8String.fromString(s)
+ SparkDateTimeUtils
+ .stringToTimestampNTZNanos(utf8Value, precision, allowTimeZone)
+ .getOrElse {
+ throw ExecutionErrors.cannotParseStringAsDataTypeError(
+ TimestampFormatter.defaultPattern(),
+ s,
+ TimestampNTZType)
+ }
+ } catch checkParsedDiff(s, legacyNanosParse)
+ }
+
+ override def parseWithoutTimeZoneNanosOptional(
+ s: String,
+ precision: Int,
+ allowTimeZone: Boolean): Option[TimestampNanosVal] =
+ SparkDateTimeUtils.stringToTimestampNTZNanos(
+ UTF8String.fromString(s),
+ precision,
+ allowTimeZone)
}
/**
@@ -491,6 +729,35 @@ class LegacyFastTimestampFormatter(pattern: String,
zoneId: ZoneId, locale: Loca
format(instantToMicros(instant))
}
+ override def parseNanos(s: String, precision: Int): TimestampNanosVal =
+ throw TimestampFormatter.legacyNanosUnsupported()
+
+ // The `*Optional` nanos methods are abstract in the trait (no swallowing
default), so the legacy
+ // formatters must implement them. They throw rather than return `None` so
the unsupported-feature
+ // error is surfaced instead of being silently masked under the LEGACY time
parser policy.
+ override def parseNanosOptional(s: String, precision: Int):
Option[TimestampNanosVal] =
+ throw TimestampFormatter.legacyNanosUnsupported()
+
+ // Without this override the trait default throws
SparkException.internalError instead of the
+ // user-facing legacyNanosUnsupported error.
+ override def parseWithoutTimeZoneNanos(
+ s: String,
+ precision: Int,
+ allowTimeZone: Boolean): TimestampNanosVal =
+ throw TimestampFormatter.legacyNanosUnsupported()
+
+ override def parseWithoutTimeZoneNanosOptional(
+ s: String,
+ precision: Int,
+ allowTimeZone: Boolean): Option[TimestampNanosVal] =
+ throw TimestampFormatter.legacyNanosUnsupported()
+
+ override def formatNanos(v: TimestampNanosVal, precision: Int): String =
+ throw TimestampFormatter.legacyNanosUnsupported()
+
+ override def formatWithoutTimeZoneNanos(v: TimestampNanosVal, precision:
Int): String =
+ throw TimestampFormatter.legacyNanosUnsupported()
+
override def validatePatternString(checkLegacy: Boolean): Unit =
fastDateFormat
}
@@ -532,6 +799,35 @@ class LegacySimpleTimestampFormatter(
format(instantToMicros(instant))
}
+ override def parseNanos(s: String, precision: Int): TimestampNanosVal =
+ throw TimestampFormatter.legacyNanosUnsupported()
+
+ // The `*Optional` nanos methods are abstract in the trait (no swallowing
default), so the legacy
+ // formatters must implement them. They throw rather than return `None` so
the unsupported-feature
+ // error is surfaced instead of being silently masked under the LEGACY time
parser policy.
+ override def parseNanosOptional(s: String, precision: Int):
Option[TimestampNanosVal] =
+ throw TimestampFormatter.legacyNanosUnsupported()
+
+ // Without this override the trait default throws
SparkException.internalError instead of the
+ // user-facing legacyNanosUnsupported error.
+ override def parseWithoutTimeZoneNanos(
+ s: String,
+ precision: Int,
+ allowTimeZone: Boolean): TimestampNanosVal =
+ throw TimestampFormatter.legacyNanosUnsupported()
+
+ override def parseWithoutTimeZoneNanosOptional(
+ s: String,
+ precision: Int,
+ allowTimeZone: Boolean): Option[TimestampNanosVal] =
+ throw TimestampFormatter.legacyNanosUnsupported()
+
+ override def formatNanos(v: TimestampNanosVal, precision: Int): String =
+ throw TimestampFormatter.legacyNanosUnsupported()
+
+ override def formatWithoutTimeZoneNanos(v: TimestampNanosVal, precision:
Int): String =
+ throw TimestampFormatter.legacyNanosUnsupported()
+
override def validatePatternString(checkLegacy: Boolean): Unit = sdf
}
@@ -548,6 +844,16 @@ object TimestampFormatter {
def defaultPattern(): String =
s"${DateFormatter.defaultPattern} ${TimeFormatter.defaultPattern}"
+ /**
+ * The legacy formatters (`FastDateFormat` / `SimpleDateFormat`) cap at
millisecond/microsecond
+ * resolution and cannot represent the sub-microsecond remainder of a
[[TimestampNanosVal]].
+ * Nanosecond-capable timestamp types are therefore unsupported under the
`LEGACY` time parser
+ * policy. This is a user-facing error (not an internal error) because the
`LEGACY` policy is
+ * user-configurable and a caller may legitimately combine it with
nanosecond timestamps.
+ */
+ def legacyNanosUnsupported(): SparkUnsupportedOperationException =
+ ExecutionErrors.nanosTimestampUnsupportedWithLegacyParserError()
+
private def getFormatter(
format: Option[String],
zoneId: ZoneId,
diff --git
a/sql/api/src/main/scala/org/apache/spark/sql/errors/ExecutionErrors.scala
b/sql/api/src/main/scala/org/apache/spark/sql/errors/ExecutionErrors.scala
index e6e3fd847298..58a82c7270a7 100644
--- a/sql/api/src/main/scala/org/apache/spark/sql/errors/ExecutionErrors.scala
+++ b/sql/api/src/main/scala/org/apache/spark/sql/errors/ExecutionErrors.scala
@@ -53,6 +53,12 @@ private[sql] trait ExecutionErrors extends
DataTypeErrorsBase {
e)
}
+ def nanosTimestampUnsupportedWithLegacyParserError():
SparkUnsupportedOperationException = {
+ new SparkUnsupportedOperationException(
+ errorClass =
"UNSUPPORTED_FEATURE.TIMESTAMP_NANOS_WITH_LEGACY_TIME_PARSER",
+ messageParameters = Map("config" ->
toSQLConf(SqlApiConf.LEGACY_TIME_PARSER_POLICY_KEY)))
+ }
+
def stateStoreHandleNotInitialized(): SparkRuntimeException = {
new SparkRuntimeException(
errorClass = "STATE_STORE_HANDLE_NOT_INITIALIZED",
diff --git
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala
index 47eb4a1e3e3c..2d9793e687a7 100644
---
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala
+++
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala
@@ -26,7 +26,7 @@ import java.util.concurrent.TimeUnit
import org.scalatest.matchers.must.Matchers
import org.scalatest.matchers.should.Matchers._
-import org.apache.spark.{SparkArithmeticException, SparkDateTimeException,
SparkFunSuite, SparkIllegalArgumentException}
+import org.apache.spark.{SparkArithmeticException, SparkDateTimeException,
SparkException, SparkFunSuite, SparkIllegalArgumentException}
import org.apache.spark.sql.catalyst.plans.SQLHelper
import org.apache.spark.sql.catalyst.util.DateTimeConstants._
import org.apache.spark.sql.catalyst.util.DateTimeTestUtils._
@@ -1956,6 +1956,24 @@ class DateTimeUtilsSuite extends SparkFunSuite with
Matchers with SQLHelper {
}
}
+ test("SPARK-57162: nanos converters raise an internal error for precision
outside [7, 9]") {
+ // `precision` is always sourced from a validated
TimestampNTZNanosType/TimestampLTZNanosType
+ // (constructible only with p in [7, 9]), so an out-of-range value is an
internal caller bug,
+ // not user input. Both the NTZ (LocalDateTime) and LTZ (Instant)
converters must reject it.
+ val ldt = LocalDateTime.parse("2019-02-26T16:56:00.123456789")
+ val instant = Instant.parse("2019-02-26T16:56:00.123456789Z")
+ Seq(6, 10).foreach { p =>
+ checkError(
+ exception =
intercept[SparkException](localDateTimeToTimestampNanos(ldt, p)),
+ condition = "INTERNAL_ERROR",
+ parameters = Map("message" -> s"Fractional second precision $p is out
of range [7, 9]."))
+ checkError(
+ exception = intercept[SparkException](instantToTimestampNanos(instant,
p)),
+ condition = "INTERNAL_ERROR",
+ parameters = Map("message" -> s"Fractional second precision $p is out
of range [7, 9]."))
+ }
+ }
+
test("SPARK-57033: random roundtrip across precisions floors to the
precision step") {
val rnd = new scala.util.Random(0)
val min = Instant.parse("0001-01-01T00:00:00Z").getEpochSecond
diff --git
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/TimestampFormatterSuite.scala
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/TimestampFormatterSuite.scala
index 558d7eda78b4..105fee7193f2 100644
---
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/TimestampFormatterSuite.scala
+++
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/TimestampFormatterSuite.scala
@@ -17,15 +17,16 @@
package org.apache.spark.sql.catalyst.util
-import java.time.{DateTimeException, LocalDateTime, ZoneId}
+import java.time.{DateTimeException, Instant, LocalDateTime, ZoneId}
import java.util.Locale
-import org.apache.spark.{SparkException, SparkUpgradeException}
+import org.apache.spark.{SparkException, SparkUnsupportedOperationException,
SparkUpgradeException}
import org.apache.spark.sql.catalyst.util.DateTimeTestUtils._
import org.apache.spark.sql.catalyst.util.DateTimeUtils._
import
org.apache.spark.sql.catalyst.util.LegacyDateFormats.LENIENT_SIMPLE_DATE_FORMAT
+import org.apache.spark.sql.catalyst.util.TimestampNanosTestUtils._
import org.apache.spark.sql.internal.{LegacyBehaviorPolicy, SQLConf}
-import org.apache.spark.unsafe.types.UTF8String
+import org.apache.spark.unsafe.types.{TimestampNanosVal, UTF8String}
class TimestampFormatterSuite extends DatetimeFormatterSuite {
@@ -559,4 +560,220 @@ class TimestampFormatterSuite extends
DatetimeFormatterSuite {
"'yyyy-MM-dd HH:mm:ss' as the target spark data type
\"TIMESTAMP_NTZ\"."))
)
}
+
+ // The expected LTZ value: floor the sub-`precision` fractional digits, then
split into
+ // (epochMicros, nanosWithinMicro). Mirrors
`SparkDateTimeUtils.instantToTimestampNanos`.
+ private def expectedLTZNanos(instant: Instant, precision: Int):
TimestampNanosVal = {
+ val truncatedNano = nanoOfSecTruncator(precision)(instant.getNano)
+ instantToNanosVal(Instant.ofEpochSecond(instant.getEpochSecond,
truncatedNano.toLong))
+ }
+
+ // The expected NTZ value (interpreted at UTC), with sub-`precision` digits
floored.
+ private def expectedNTZNanos(ldt: LocalDateTime, precision: Int):
TimestampNanosVal = {
+
localDateTimeToNanosVal(ldt.withNano(nanoOfSecTruncator(precision)(ldt.getNano)))
+ }
+
+ private val nanosPattern = "yyyy-MM-dd'T'HH:mm:ss.SSSSSSSSS"
+
+ test("SPARK-57162: Iso8601 formatter parses strings into TimestampNanosVal
(LTZ)") {
+ outstandingZoneIds.foreach { zoneId =>
+ val formatter = TimestampFormatter(nanosPattern, zoneId, isParsing =
true)
+ foreachNanosPrecision { precision =>
+ specialNanosTs.foreach { ts =>
+ val input = ts.replace(' ', 'T')
+ val expected = expectedLTZNanos(parseSpecialNanosLTZ(ts, zoneId),
precision)
+ assert(formatter.parseNanos(input, precision) === expected)
+ assert(formatter.parseNanosOptional(input,
precision).contains(expected))
+ }
+ }
+ }
+ }
+
+ test("SPARK-57162: Iso8601 formatter parses strings into TimestampNanosVal
(NTZ)") {
+ // NTZ values are zone-independent (interpreted at UTC), so a single
formatter zone suffices.
+ val formatter = TimestampFormatter(nanosPattern, UTC, isParsing = true)
+ foreachNanosPrecision { precision =>
+ specialNanosTs.foreach { ts =>
+ val input = ts.replace(' ', 'T')
+ val expected = expectedNTZNanos(parseSpecialNanosNTZ(ts), precision)
+ assert(formatter.parseWithoutTimeZoneNanos(input, precision) ===
expected)
+ assert(formatter.parseWithoutTimeZoneNanos(input, precision,
allowTimeZone = true) ===
+ expected)
+ assert(formatter.parseWithoutTimeZoneNanosOptional(input, precision,
allowTimeZone = true)
+ .contains(expected))
+ }
+ }
+ }
+
+ test("SPARK-57162: round-trip TimestampNanosVal -> string ->
TimestampNanosVal") {
+ outstandingZoneIds.foreach { zoneId =>
+ val parser = TimestampFormatter(nanosPattern, zoneId, isParsing = true)
+ val printer = TimestampFormatter(nanosPattern, zoneId, isParsing = false)
+ foreachNanosPrecision { precision =>
+ specialNanosTs.foreach { ts =>
+ val value = expectedLTZNanos(parseSpecialNanosLTZ(ts, zoneId),
precision)
+ val formatted = printer.formatNanos(value, precision)
+ assert(parser.parseNanos(formatted, precision) === value)
+ }
+ }
+ }
+ }
+
+ test("SPARK-57162: sub-precision fractional digits are truncated on parse") {
+ val formatter = TimestampFormatter(nanosPattern, UTC, isParsing = true)
+ val input = "1970-01-01T00:00:00.123456789"
+ Seq(
+ 9 -> nanosVal(123456L, 789),
+ 8 -> nanosVal(123456L, 780),
+ 7 -> nanosVal(123456L, 700)).foreach { case (precision, expected) =>
+ assert(formatter.parseNanos(input, precision) === expected)
+ assert(formatter.parseWithoutTimeZoneNanos(input, precision) ===
expected)
+ }
+ }
+
+ test("SPARK-57162: formatNanos truncates to precision and renders per
pattern width") {
+ val value = nanosVal(123456L, 789) // 1970-01-01 00:00:00.123456789 at UTC
+ val fixed = TimestampFormatter(nanosPattern, UTC, isParsing = false)
+ assert(fixed.formatNanos(value, 9) === "1970-01-01T00:00:00.123456789")
+ assert(fixed.formatNanos(value, 8) === "1970-01-01T00:00:00.123456780")
+ assert(fixed.formatNanos(value, 7) === "1970-01-01T00:00:00.123456700")
+
+ // The fraction formatter omits trailing zeros.
+ val fraction = TimestampFormatter.getFractionFormatter(UTC)
+ assert(fraction.formatNanos(value, 9) === "1970-01-01 00:00:00.123456789")
+ assert(fraction.formatNanos(value, 8) === "1970-01-01 00:00:00.12345678")
+ assert(fraction.formatNanos(value, 7) === "1970-01-01 00:00:00.1234567")
+ }
+
+ test("SPARK-57162: formatWithoutTimeZoneNanos is zone-independent (NTZ)") {
+ // Regression guard for an LTZ-only `formatNanos`: with a non-UTC
formatter zone, the NTZ
+ // method must render the UTC-grid wall clock unchanged, whereas
`formatNanos` (LTZ) routes
+ // through `format(Instant)` and shifts the value into the zone. All-UTC
NTZ cases miss this.
+ val value = nanosVal(123456L, 789) // wall clock 1970-01-01
00:00:00.123456789 on the UTC grid
+ val zone = getZoneId("+01:00")
+ val printer = TimestampFormatter(nanosPattern, zone, isParsing = false)
+ // The 9-`S` pattern always emits 9 fractional digits; truncation zeros
the low ones.
+ Seq(
+ 9 -> "1970-01-01T00:00:00.123456789",
+ 8 -> "1970-01-01T00:00:00.123456780",
+ 7 -> "1970-01-01T00:00:00.123456700").foreach { case (precision,
expectedNtz) =>
+ assert(printer.formatWithoutTimeZoneNanos(value, precision) ===
expectedNtz)
+ }
+ // LTZ rendering of the same value is shifted by the +01:00 offset.
+ assert(printer.formatNanos(value, 9) === "1970-01-01T01:00:00.123456789")
+ // The NTZ output round-trips through the matching NTZ parser regardless
of formatter zone.
+ val parser = TimestampFormatter(nanosPattern, zone, isParsing = true)
+ assert(parser.parseWithoutTimeZoneNanos(
+ printer.formatWithoutTimeZoneNanos(value, 9), 9) === value)
+ }
+
+ test("SPARK-57162: NTZ nanos parse rejects a time zone when not allowed") {
+ val formatter = TimestampFormatter(
+ "yyyy-MM-dd'T'HH:mm:ss.SSSSSSSSSXXX",
+ UTC,
+ isParsing = true)
+ val input = "2018-12-02T10:11:12.123456789+01:00"
+ // When the zone component is allowed it is discarded and the local fields
are kept.
+ val expected = expectedNTZNanos(LocalDateTime.of(2018, 12, 2, 10, 11, 12,
123456789), 9)
+ assert(formatter.parseWithoutTimeZoneNanos(input, 9, allowTimeZone = true)
=== expected)
+
+ intercept[SparkException] {
+ formatter.parseWithoutTimeZoneNanos(input, 9, allowTimeZone = false)
+ }
+ assert(formatter.parseWithoutTimeZoneNanosOptional(input, 9, allowTimeZone
= false).isEmpty)
+ }
+
+ test("SPARK-57162: DefaultTimestampFormatter parses nanos without a
pattern") {
+ outstandingZoneIds.foreach { zoneId =>
+ val formatter = new DefaultTimestampFormatter(
+ zoneId,
+ locale = DateFormatter.defaultLocale,
+ legacyFormat = LegacyDateFormats.SIMPLE_DATE_FORMAT,
+ isParsing = true)
+ val ldt = LocalDateTime.of(2021, 8, 12, 18, 31, 50, 123456789)
+ val input = "2021-08-12T18:31:50.123456789"
+ foreachNanosPrecision { precision =>
+ val expectedLtz = expectedLTZNanos(ldt.atZone(zoneId).toInstant,
precision)
+ assert(formatter.parseNanos(input, precision) === expectedLtz)
+ assert(formatter.parseNanosOptional(input,
precision).contains(expectedLtz))
+ val expectedNtz = expectedNTZNanos(ldt, precision)
+ assert(formatter.parseWithoutTimeZoneNanos(input, precision) ===
expectedNtz)
+ assert(formatter.parseWithoutTimeZoneNanosOptional(input, precision,
allowTimeZone = true)
+ .contains(expectedNtz))
+ }
+ assert(formatter.parseNanosOptional("x123", 9).isEmpty)
+ assert(formatter.parseWithoutTimeZoneNanosOptional("x123", 9,
allowTimeZone = true).isEmpty)
+ }
+ }
+
+ test("SPARK-57162: DefaultTimestampFormatter.formatNanos uses the default
pattern (no fracs)") {
+ // DefaultTimestampFormatter inherits
Iso8601TimestampFormatter.formatNanos, which renders via
+ // the default pattern "yyyy-MM-dd HH:mm:ss". That pattern has no S
fields, so sub-second
+ // digits are not emitted. This is expected behaviour:
DefaultTimestampFormatter is
+ // parse-oriented and callers that need fractional output should use
FractionTimestampFormatter.
+ val formatter = new DefaultTimestampFormatter(
+ UTC,
+ locale = DateFormatter.defaultLocale,
+ legacyFormat = LegacyDateFormats.SIMPLE_DATE_FORMAT,
+ isParsing = false)
+ val value = nanosVal(123456L, 789) // 1970-01-01 00:00:00.123456789 UTC
+ assert(formatter.formatNanos(value, 9) === "1970-01-01 00:00:00")
+ assert(formatter.formatNanos(value, 7) === "1970-01-01 00:00:00")
+ }
+
+ test("SPARK-57162: legacy formatters reject nanosecond precision") {
+ val fast = new LegacyFastTimestampFormatter(
+ "yyyy-MM-dd HH:mm:ss.SSSSSS",
+ zoneId = UTC,
+ locale = DateFormatter.defaultLocale)
+ val simple = new LegacySimpleTimestampFormatter(
+ "yyyy-MM-dd HH:mm:ss.SSSSSS",
+ zoneId = UTC,
+ locale = DateFormatter.defaultLocale)
+ val expectedParameters = Map(
+ "config" -> ("\"" + SQLConf.LEGACY_TIME_PARSER_POLICY.key + "\""))
+ Seq[TimestampFormatter](fast, simple).foreach { formatter =>
+ checkError(
+ exception = intercept[SparkUnsupportedOperationException] {
+ formatter.parseNanos("2020-01-01 00:00:00.123456789", 9)
+ },
+ condition =
"UNSUPPORTED_FEATURE.TIMESTAMP_NANOS_WITH_LEGACY_TIME_PARSER",
+ parameters = expectedParameters)
+ // The optional variants must surface the unsupported-feature error too,
not swallow it and
+ // return None. Their counterparts are abstract in the trait
specifically to force this.
+ checkError(
+ exception = intercept[SparkUnsupportedOperationException] {
+ formatter.parseNanosOptional("2020-01-01 00:00:00.123456789", 9)
+ },
+ condition =
"UNSUPPORTED_FEATURE.TIMESTAMP_NANOS_WITH_LEGACY_TIME_PARSER",
+ parameters = expectedParameters)
+ checkError(
+ exception = intercept[SparkUnsupportedOperationException] {
+ formatter.formatNanos(nanosVal(0L, 1), 9)
+ },
+ condition =
"UNSUPPORTED_FEATURE.TIMESTAMP_NANOS_WITH_LEGACY_TIME_PARSER",
+ parameters = expectedParameters)
+ checkError(
+ exception = intercept[SparkUnsupportedOperationException] {
+ formatter.parseWithoutTimeZoneNanos("2020-01-01 00:00:00.123456789",
9)
+ },
+ condition =
"UNSUPPORTED_FEATURE.TIMESTAMP_NANOS_WITH_LEGACY_TIME_PARSER",
+ parameters = expectedParameters)
+ checkError(
+ exception = intercept[SparkUnsupportedOperationException] {
+ formatter.parseWithoutTimeZoneNanosOptional(
+ "2020-01-01 00:00:00.123456789",
+ 9,
+ allowTimeZone = true)
+ },
+ condition =
"UNSUPPORTED_FEATURE.TIMESTAMP_NANOS_WITH_LEGACY_TIME_PARSER",
+ parameters = expectedParameters)
+ checkError(
+ exception = intercept[SparkUnsupportedOperationException] {
+ formatter.formatWithoutTimeZoneNanos(nanosVal(0L, 1), 9)
+ },
+ condition =
"UNSUPPORTED_FEATURE.TIMESTAMP_NANOS_WITH_LEGACY_TIME_PARSER",
+ parameters = expectedParameters)
+ }
+ }
}
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]