[jira] [Created] (SPARK-34097) overflow for datetime datatype when creating stride + JDBC parallel read

Pradip Sodha (Jira) Wed, 13 Jan 2021 03:48:12 -0800

Pradip Sodha created SPARK-34097:
------------------------------------

             Summary: overflow for datetime datatype when creating stride +  
JDBC parallel read
                 Key: SPARK-34097
                 URL: https://issues.apache.org/jira/browse/SPARK-34097
             Project: Spark
          Issue Type: Bug
          Components: Spark Core, SQL
    Affects Versions: 3.0.1
         Environment: spark 3.0.1


sql server v12.0
            Reporter: Pradip Sodha


I'm trying to do JDBC parallel read with datetime  column as partition column
{code:java}
create table eData (eid int, start_time datetime) -- sql server v12.0{code}
 
{code:java}
val df = spark // spark 3.0.1
    .read
    .format("jdbc")
    .option("url", "jdbc:sqlserver://...")
    .option("partitionColumn", "start_time")
    .option("lowerBound", "2000-01-01T01:01:11.546")
    .option("upperBound", "2000-01-02T01:01:11.547")
    .option("numPartitions", "10")
    .option("dbtable", "eData")
    .load();
df.show(false){code}
and getting this error,
{code:java}
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 
7, 10.139.64.6, executor 0): com.microsoft.sqlserver.jdbc.SQLServerException: 
Conversion failed when converting date and/or time from character string.     
at 
com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:262)
        at 
com.microsoft.sqlserver.jdbc.SQLServerResultSet$FetchBuffer.nextRow(SQLServerResultSet.java:5435)
        at 
com.microsoft.sqlserver.jdbc.SQLServerResultSet.fetchBufferNext(SQLServerResultSet.java:1770)
        at 
com.microsoft.sqlserver.jdbc.SQLServerResultSet.next(SQLServerResultSet.java:1028)
        at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:357)
        at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:343)
        at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
        at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
        at 
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31)
        at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
 Source)
        at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
        at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:731)
        at 
org.apache.spark.sql.execution.collect.UnsafeRowBatchUtils$.encodeUnsafeRows(UnsafeRowBatchUtils.scala:80)
        at 
org.apache.spark.sql.execution.collect.Collector.$anonfun$processFunc$1(Collector.scala:187)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
        at org.apache.spark.scheduler.Task.doRunTask(Task.scala:144)
        at org.apache.spark.scheduler.Task.run(Task.scala:117)
        at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$9(Executor.scala:657)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1581)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:660)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
...........
Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: Conversion failed 
when converting date and/or time from character string.{code}
 

which is expted because, because the query desing by spark is,
{code:java}
21/01/13 11:09:37 INFO JDBCRelation: Number of partitions: 10, WHERE clauses of 
these partitions: "start_time" < '2000-01-01 03:25:11.5461' or "start_time" is 
null, "start_time" >= '2000-01-01 03:25:11.5461' AND "start_time" < '2000-01-01 
05:49:11.5462', "start_time" >= '2000-01-01 05:49:11.5462' AND "start_time" < 
'2000-01-01 08:13:11.5463', "start_time" >= '2000-01-01 08:13:11.5463' AND 
"start_time" < '2000-01-01 10:37:11.5464', "start_time" >= '2000-01-01 
10:37:11.5464' AND "start_time" < '2000-01-01 13:01:11.5465', "start_time" >= 
'2000-01-01 13:01:11.5465' AND "start_time" < '2000-01-01 15:25:11.5466', 
"start_time" >= '2000-01-01 15:25:11.5466' AND "start_time" < '2000-01-01 
17:49:11.5467', "start_time" >= '2000-01-01 17:49:11.5467' AND "start_time" < 
'2000-01-01 20:13:11.5468', "start_time" >= '2000-01-01 20:13:11.5468' AND 
"start_time" < '2000-01-01 22:37:11.5469', "start_time" >= '2000-01-01 
22:37:11.5469'
{code}
so, the date use in query is '2000-01-01 22:37:11.*5469*' but datetime accept 
only three digit for fraction - YYYY-MM-DDThh:mm:ss[*.mmm*] [datetime 
doc|https://docs.microsoft.com/en-us/sql/t-sql/data-types/datetime-transact-sql?view=sql-server-ver15]

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (SPARK-34097) overflow for datetime datatype when creating stride + JDBC parallel read

Reply via email to