Vinod KC created SPARK-54601:
--------------------------------

             Summary: Data loss in TimestampType when writing to MS SQL Server: 
Microsecond precision truncated to milliseconds
                 Key: SPARK-54601
                 URL: https://issues.apache.org/jira/browse/SPARK-54601
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 4.0.0, 3.5.0
            Reporter: Vinod KC


Spark silently loses microsecond precision when writing *timestamps* to MS SQL 
Server via JDBC, causing data corruption.

Spark maps TimestampType/TimestampTypeNTZ to SQL Server's legacy DATETIME type 
which only supports millisecond precision (3 decimal places). The last 3 
decimal places (microseconds) are automatically truncated with no warning.

Data Loss Example - Complete Round-Trip

 
{code:java}
import java.sql.Timestamp
// Create data with Spark's full microsecond precision
val data = Seq(
  (1, Timestamp.valueOf("2024-12-03 14:25:37.789123")),
  (2, Timestamp.valueOf("2024-12-03 14:25:37.789456"))
).toDF("id", "timestamp")
// Original microseconds: 789123, 789456 (Spark stores 6 decimal places)
data.collect().foreach(r => 
  println(s"Microseconds: ${r.getAs[Timestamp]("timestamp").getNanos / 1000}")
)
// Output: Microseconds: 789123, Microseconds: 789456
// Write to SQL Server (mapped to DATETIME - only 3 decimal places!)
data.write.mode("overwrite").jdbc(jdbcUrl, "test_table", props)
// Read back
val result = spark.read.jdbc(jdbcUrl, "test_table", props)
// Retrieved microseconds: 789000, 789000  ❌ DATA LOST!
result.collect().foreach(r => 
  println(s"Microseconds: ${r.getAs[Timestamp]("timestamp").getNanos / 1000}")
)
// Output: Microseconds: 789000, Microseconds: 789000
// Lost: 123 and 456 microseconds respectively (digits 4-6)
// Both records now appear simultaneous - ordering corrupted
{code}
*Root cause :* 

Spark currently maps TimestampType and TimestampNTZType to SQL Server's legacy 
DATETIME type, which only supports 3 decimal places (millisecond precision). 
This causes automatic truncation of any microsecond data beyond the third 
decimal place.

 

*References:*
Microsoft Documentation: "Avoid using datetime for new work. Instead, use the  
datetime2 data type."
[https://learn.microsoft.com/en-us/sql/t-sql/data-types/datetime-transact-sql]

 
{code:java}
DATETIME limitations: 3.33 milliseconds precision (rounded)
DATETIME2 precision: 100 nanoseconds (0.0000001 seconds){code}
 

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to