Vinod KC created SPARK-54601:
--------------------------------
Summary: Data loss in TimestampType when writing to MS SQL Server:
Microsecond precision truncated to milliseconds
Key: SPARK-54601
URL: https://issues.apache.org/jira/browse/SPARK-54601
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 4.0.0, 3.5.0
Reporter: Vinod KC
Spark silently loses microsecond precision when writing *timestamps* to MS SQL
Server via JDBC, causing data corruption.
Spark maps TimestampType/TimestampTypeNTZ to SQL Server's legacy DATETIME type
which only supports millisecond precision (3 decimal places). The last 3
decimal places (microseconds) are automatically truncated with no warning.
Data Loss Example - Complete Round-Trip
{code:java}
import java.sql.Timestamp
// Create data with Spark's full microsecond precision
val data = Seq(
(1, Timestamp.valueOf("2024-12-03 14:25:37.789123")),
(2, Timestamp.valueOf("2024-12-03 14:25:37.789456"))
).toDF("id", "timestamp")
// Original microseconds: 789123, 789456 (Spark stores 6 decimal places)
data.collect().foreach(r =>
println(s"Microseconds: ${r.getAs[Timestamp]("timestamp").getNanos / 1000}")
)
// Output: Microseconds: 789123, Microseconds: 789456
// Write to SQL Server (mapped to DATETIME - only 3 decimal places!)
data.write.mode("overwrite").jdbc(jdbcUrl, "test_table", props)
// Read back
val result = spark.read.jdbc(jdbcUrl, "test_table", props)
// Retrieved microseconds: 789000, 789000 ❌ DATA LOST!
result.collect().foreach(r =>
println(s"Microseconds: ${r.getAs[Timestamp]("timestamp").getNanos / 1000}")
)
// Output: Microseconds: 789000, Microseconds: 789000
// Lost: 123 and 456 microseconds respectively (digits 4-6)
// Both records now appear simultaneous - ordering corrupted
{code}
*Root cause :*
Spark currently maps TimestampType and TimestampNTZType to SQL Server's legacy
DATETIME type, which only supports 3 decimal places (millisecond precision).
This causes automatic truncation of any microsecond data beyond the third
decimal place.
*References:*
Microsoft Documentation: "Avoid using datetime for new work. Instead, use the
datetime2 data type."
[https://learn.microsoft.com/en-us/sql/t-sql/data-types/datetime-transact-sql]
{code:java}
DATETIME limitations: 3.33 milliseconds precision (rounded)
DATETIME2 precision: 100 nanoseconds (0.0000001 seconds){code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]