[jira] [Updated] (SPARK-34097) overflow for datetime datatype when creating stride + JDBC parallel read

Pradip Sodha (Jira) Wed, 13 Jan 2021 03:50:04 -0800


     [ 
https://issues.apache.org/jira/browse/SPARK-34097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Pradip Sodha updated SPARK-34097:
---------------------------------
    Description: 
I'm trying to do JDBC parallel read with datetime  column as partition column
{code:java}
create table eData (eid int, start_time datetime) -- sql server v12.0
--inserting some data{code}
 
{code:java}
val df = spark // spark 3.0.1
    .read
    .format("jdbc")
    .option("url", "jdbc:sqlserver://...")
    .option("partitionColumn", "start_time")
    .option("lowerBound", "2000-01-01T01:01:11.546")
    .option("upperBound", "2000-01-02T01:01:11.547")
    .option("numPartitions", "10")
    .option("dbtable", "eData")
    .load();
df.show(false){code}
and getting this error,
{code:java}
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 
7, 10.139.64.6, executor 0): com.microsoft.sqlserver.jdbc.SQLServerException: 
Conversion failed when converting date and/or time from character string.     
at 
com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:262)
        at 
com.microsoft.sqlserver.jdbc.SQLServerResultSet$FetchBuffer.nextRow(SQLServerResultSet.java:5435)
        at 
com.microsoft.sqlserver.jdbc.SQLServerResultSet.fetchBufferNext(SQLServerResultSet.java:1770)
        at 
com.microsoft.sqlserver.jdbc.SQLServerResultSet.next(SQLServerResultSet.java:1028)
        at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:357)
        at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:343)
        at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
        at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
        at 
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31)
        at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
 Source)
        at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
        at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:731)
        at 
org.apache.spark.sql.execution.collect.UnsafeRowBatchUtils$.encodeUnsafeRows(UnsafeRowBatchUtils.scala:80)
        at 
org.apache.spark.sql.execution.collect.Collector.$anonfun$processFunc$1(Collector.scala:187)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
        at org.apache.spark.scheduler.Task.doRunTask(Task.scala:144)
        at org.apache.spark.scheduler.Task.run(Task.scala:117)
        at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$9(Executor.scala:657)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1581)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:660)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
...........
Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: Conversion failed 
when converting date and/or time from character string.{code}
 

which is expted because, because the query desing by spark is,
{code:java}
21/01/13 11:09:37 INFO JDBCRelation: Number of partitions: 10, WHERE clauses of 
these partitions: "start_time" < '2000-01-01 03:25:11.5461' or "start_time" is 
null, "start_time" >= '2000-01-01 03:25:11.5461' AND "start_time" < '2000-01-01 
05:49:11.5462', "start_time" >= '2000-01-01 05:49:11.5462' AND "start_time" < 
'2000-01-01 08:13:11.5463', "start_time" >= '2000-01-01 08:13:11.5463' AND 
"start_time" < '2000-01-01 10:37:11.5464', "start_time" >= '2000-01-01 
10:37:11.5464' AND "start_time" < '2000-01-01 13:01:11.5465', "start_time" >= 
'2000-01-01 13:01:11.5465' AND "start_time" < '2000-01-01 15:25:11.5466', 
"start_time" >= '2000-01-01 15:25:11.5466' AND "start_time" < '2000-01-01 
17:49:11.5467', "start_time" >= '2000-01-01 17:49:11.5467' AND "start_time" < 
'2000-01-01 20:13:11.5468', "start_time" >= '2000-01-01 20:13:11.5468' AND 
"start_time" < '2000-01-01 22:37:11.5469', "start_time" >= '2000-01-01 
22:37:11.5469'
{code}
so, the date use in query is '2000-01-01 22:37:11.*5469*' but datetime accept 
only three digit for fraction - YYYY-MM-DDThh:mm:ss[.mmm] [datetime 
doc|https://docs.microsoft.com/en-us/sql/t-sql/data-types/datetime-transact-sql?view=sql-server-ver15]

 

 

  was:
I'm trying to do JDBC parallel read with datetime  column as partition column
{code:java}
create table eData (eid int, start_time datetime) -- sql server v12.0{code}
 
{code:java}
val df = spark // spark 3.0.1
    .read
    .format("jdbc")
    .option("url", "jdbc:sqlserver://...")
    .option("partitionColumn", "start_time")
    .option("lowerBound", "2000-01-01T01:01:11.546")
    .option("upperBound", "2000-01-02T01:01:11.547")
    .option("numPartitions", "10")
    .option("dbtable", "eData")
    .load();
df.show(false){code}
and getting this error,
{code:java}
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 
7, 10.139.64.6, executor 0): com.microsoft.sqlserver.jdbc.SQLServerException: 
Conversion failed when converting date and/or time from character string.     
at 
com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:262)
        at 
com.microsoft.sqlserver.jdbc.SQLServerResultSet$FetchBuffer.nextRow(SQLServerResultSet.java:5435)
        at 
com.microsoft.sqlserver.jdbc.SQLServerResultSet.fetchBufferNext(SQLServerResultSet.java:1770)
        at 
com.microsoft.sqlserver.jdbc.SQLServerResultSet.next(SQLServerResultSet.java:1028)
        at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:357)
        at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:343)
        at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
        at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
        at 
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31)
        at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
 Source)
        at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
        at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:731)
        at 
org.apache.spark.sql.execution.collect.UnsafeRowBatchUtils$.encodeUnsafeRows(UnsafeRowBatchUtils.scala:80)
        at 
org.apache.spark.sql.execution.collect.Collector.$anonfun$processFunc$1(Collector.scala:187)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
        at org.apache.spark.scheduler.Task.doRunTask(Task.scala:144)
        at org.apache.spark.scheduler.Task.run(Task.scala:117)
        at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$9(Executor.scala:657)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1581)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:660)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
...........
Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: Conversion failed 
when converting date and/or time from character string.{code}
 

which is expted because, because the query desing by spark is,
{code:java}
21/01/13 11:09:37 INFO JDBCRelation: Number of partitions: 10, WHERE clauses of 
these partitions: "start_time" < '2000-01-01 03:25:11.5461' or "start_time" is 
null, "start_time" >= '2000-01-01 03:25:11.5461' AND "start_time" < '2000-01-01 
05:49:11.5462', "start_time" >= '2000-01-01 05:49:11.5462' AND "start_time" < 
'2000-01-01 08:13:11.5463', "start_time" >= '2000-01-01 08:13:11.5463' AND 
"start_time" < '2000-01-01 10:37:11.5464', "start_time" >= '2000-01-01 
10:37:11.5464' AND "start_time" < '2000-01-01 13:01:11.5465', "start_time" >= 
'2000-01-01 13:01:11.5465' AND "start_time" < '2000-01-01 15:25:11.5466', 
"start_time" >= '2000-01-01 15:25:11.5466' AND "start_time" < '2000-01-01 
17:49:11.5467', "start_time" >= '2000-01-01 17:49:11.5467' AND "start_time" < 
'2000-01-01 20:13:11.5468', "start_time" >= '2000-01-01 20:13:11.5468' AND 
"start_time" < '2000-01-01 22:37:11.5469', "start_time" >= '2000-01-01 
22:37:11.5469'
{code}
so, the date use in query is '2000-01-01 22:37:11.*5469*' but datetime accept 
only three digit for fraction - YYYY-MM-DDThh:mm:ss[.mmm] [datetime 
doc|https://docs.microsoft.com/en-us/sql/t-sql/data-types/datetime-transact-sql?view=sql-server-ver15]

 

 


> overflow for datetime datatype when creating stride +  JDBC parallel read
> -------------------------------------------------------------------------
>
>                 Key: SPARK-34097
>                 URL: https://issues.apache.org/jira/browse/SPARK-34097
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, SQL
>    Affects Versions: 3.0.1
>         Environment: spark 3.0.1
> sql server v12.0
>            Reporter: Pradip Sodha
>            Priority: Major
>
> I'm trying to do JDBC parallel read with datetime  column as partition column
> {code:java}
> create table eData (eid int, start_time datetime) -- sql server v12.0
> --inserting some data{code}
>  
> {code:java}
> val df = spark // spark 3.0.1
>     .read
>     .format("jdbc")
>     .option("url", "jdbc:sqlserver://...")
>     .option("partitionColumn", "start_time")
>     .option("lowerBound", "2000-01-01T01:01:11.546")
>     .option("upperBound", "2000-01-02T01:01:11.547")
>     .option("numPartitions", "10")
>     .option("dbtable", "eData")
>     .load();
> df.show(false){code}
> and getting this error,
> {code:java}
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 
> (TID 7, 10.139.64.6, executor 0): 
> com.microsoft.sqlserver.jdbc.SQLServerException: Conversion failed when 
> converting date and/or time from character string.   at 
> com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:262)
>       at 
> com.microsoft.sqlserver.jdbc.SQLServerResultSet$FetchBuffer.nextRow(SQLServerResultSet.java:5435)
>       at 
> com.microsoft.sqlserver.jdbc.SQLServerResultSet.fetchBufferNext(SQLServerResultSet.java:1770)
>       at 
> com.microsoft.sqlserver.jdbc.SQLServerResultSet.next(SQLServerResultSet.java:1028)
>       at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:357)
>       at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:343)
>       at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
>       at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
>       at 
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31)
>       at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>  Source)
>       at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>       at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:731)
>       at 
> org.apache.spark.sql.execution.collect.UnsafeRowBatchUtils$.encodeUnsafeRows(UnsafeRowBatchUtils.scala:80)
>       at 
> org.apache.spark.sql.execution.collect.Collector.$anonfun$processFunc$1(Collector.scala:187)
>       at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>       at org.apache.spark.scheduler.Task.doRunTask(Task.scala:144)
>       at org.apache.spark.scheduler.Task.run(Task.scala:117)
>       at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$9(Executor.scala:657)
>       at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1581)
>       at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:660)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> ...........
> Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: Conversion failed 
> when converting date and/or time from character string.{code}
>  
> which is expted because, because the query desing by spark is,
> {code:java}
> 21/01/13 11:09:37 INFO JDBCRelation: Number of partitions: 10, WHERE clauses 
> of these partitions: "start_time" < '2000-01-01 03:25:11.5461' or 
> "start_time" is null, "start_time" >= '2000-01-01 03:25:11.5461' AND 
> "start_time" < '2000-01-01 05:49:11.5462', "start_time" >= '2000-01-01 
> 05:49:11.5462' AND "start_time" < '2000-01-01 08:13:11.5463', "start_time" >= 
> '2000-01-01 08:13:11.5463' AND "start_time" < '2000-01-01 10:37:11.5464', 
> "start_time" >= '2000-01-01 10:37:11.5464' AND "start_time" < '2000-01-01 
> 13:01:11.5465', "start_time" >= '2000-01-01 13:01:11.5465' AND "start_time" < 
> '2000-01-01 15:25:11.5466', "start_time" >= '2000-01-01 15:25:11.5466' AND 
> "start_time" < '2000-01-01 17:49:11.5467', "start_time" >= '2000-01-01 
> 17:49:11.5467' AND "start_time" < '2000-01-01 20:13:11.5468', "start_time" >= 
> '2000-01-01 20:13:11.5468' AND "start_time" < '2000-01-01 22:37:11.5469', 
> "start_time" >= '2000-01-01 22:37:11.5469'
> {code}
> so, the date use in query is '2000-01-01 22:37:11.*5469*' but datetime accept 
> only three digit for fraction - YYYY-MM-DDThh:mm:ss[.mmm] [datetime 
> doc|https://docs.microsoft.com/en-us/sql/t-sql/data-types/datetime-transact-sql?view=sql-server-ver15]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-34097) overflow for datetime datatype when creating stride + JDBC parallel read

Reply via email to