Pradip Sodha created SPARK-34097:
------------------------------------
Summary: overflow for datetime datatype when creating stride +
JDBC parallel read
Key: SPARK-34097
URL: https://issues.apache.org/jira/browse/SPARK-34097
Project: Spark
Issue Type: Bug
Components: Spark Core, SQL
Affects Versions: 3.0.1
Environment: spark 3.0.1
sql server v12.0
Reporter: Pradip Sodha
I'm trying to do JDBC parallel read with datetime column as partition column
{code:java}
create table eData (eid int, start_time datetime) -- sql server v12.0{code}
{code:java}
val df = spark // spark 3.0.1
.read
.format("jdbc")
.option("url", "jdbc:sqlserver://...")
.option("partitionColumn", "start_time")
.option("lowerBound", "2000-01-01T01:01:11.546")
.option("upperBound", "2000-01-02T01:01:11.547")
.option("numPartitions", "10")
.option("dbtable", "eData")
.load();
df.show(false){code}
and getting this error,
{code:java}
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in
stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID
7, 10.139.64.6, executor 0): com.microsoft.sqlserver.jdbc.SQLServerException:
Conversion failed when converting date and/or time from character string.
at
com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:262)
at
com.microsoft.sqlserver.jdbc.SQLServerResultSet$FetchBuffer.nextRow(SQLServerResultSet.java:5435)
at
com.microsoft.sqlserver.jdbc.SQLServerResultSet.fetchBufferNext(SQLServerResultSet.java:1770)
at
com.microsoft.sqlserver.jdbc.SQLServerResultSet.next(SQLServerResultSet.java:1028)
at
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:357)
at
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:343)
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
at
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31)
at
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
Source)
at
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at
org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:731)
at
org.apache.spark.sql.execution.collect.UnsafeRowBatchUtils$.encodeUnsafeRows(UnsafeRowBatchUtils.scala:80)
at
org.apache.spark.sql.execution.collect.Collector.$anonfun$processFunc$1(Collector.scala:187)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.doRunTask(Task.scala:144)
at org.apache.spark.scheduler.Task.run(Task.scala:117)
at
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$9(Executor.scala:657)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1581)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:660)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
...........
Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: Conversion failed
when converting date and/or time from character string.{code}
which is expted because, because the query desing by spark is,
{code:java}
21/01/13 11:09:37 INFO JDBCRelation: Number of partitions: 10, WHERE clauses of
these partitions: "start_time" < '2000-01-01 03:25:11.5461' or "start_time" is
null, "start_time" >= '2000-01-01 03:25:11.5461' AND "start_time" < '2000-01-01
05:49:11.5462', "start_time" >= '2000-01-01 05:49:11.5462' AND "start_time" <
'2000-01-01 08:13:11.5463', "start_time" >= '2000-01-01 08:13:11.5463' AND
"start_time" < '2000-01-01 10:37:11.5464', "start_time" >= '2000-01-01
10:37:11.5464' AND "start_time" < '2000-01-01 13:01:11.5465', "start_time" >=
'2000-01-01 13:01:11.5465' AND "start_time" < '2000-01-01 15:25:11.5466',
"start_time" >= '2000-01-01 15:25:11.5466' AND "start_time" < '2000-01-01
17:49:11.5467', "start_time" >= '2000-01-01 17:49:11.5467' AND "start_time" <
'2000-01-01 20:13:11.5468', "start_time" >= '2000-01-01 20:13:11.5468' AND
"start_time" < '2000-01-01 22:37:11.5469', "start_time" >= '2000-01-01
22:37:11.5469'
{code}
so, the date use in query is '2000-01-01 22:37:11.*5469*' but datetime accept
only three digit for fraction - YYYY-MM-DDThh:mm:ss[*.mmm*] [datetime
doc|https://docs.microsoft.com/en-us/sql/t-sql/data-types/datetime-transact-sql?view=sql-server-ver15]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]