[
https://issues.apache.org/jira/browse/SPARK-34097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Pradip Sodha updated SPARK-34097:
---------------------------------
Description:
I'm trying to do JDBC parallel read with datetime column as partition column
{code:java}
create table eData (eid int, start_time datetime) -- sql server v12.0
--inserting some data{code}
{code:java}
val df = spark // spark 3.0.1
.read
.format("jdbc")
.option("url", "jdbc:sqlserver://...")
.option("partitionColumn", "start_time")
.option("lowerBound", "2000-01-01T01:01:11.546")
.option("upperBound", "2000-01-02T01:01:11.547")
.option("numPartitions", "10")
.option("dbtable", "eData")
.load();
df.show(false){code}
and getting this error,
{code:java}
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in
stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID
7, 10.139.64.6, executor 0): com.microsoft.sqlserver.jdbc.SQLServerException:
Conversion failed when converting date and/or time from character string.
at
com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:262)
at
com.microsoft.sqlserver.jdbc.SQLServerResultSet$FetchBuffer.nextRow(SQLServerResultSet.java:5435)
at
com.microsoft.sqlserver.jdbc.SQLServerResultSet.fetchBufferNext(SQLServerResultSet.java:1770)
at
com.microsoft.sqlserver.jdbc.SQLServerResultSet.next(SQLServerResultSet.java:1028)
at
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:357)
at
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:343)
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
at
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31)
at
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
Source)
at
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at
org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:731)
at
org.apache.spark.sql.execution.collect.UnsafeRowBatchUtils$.encodeUnsafeRows(UnsafeRowBatchUtils.scala:80)
at
org.apache.spark.sql.execution.collect.Collector.$anonfun$processFunc$1(Collector.scala:187)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.doRunTask(Task.scala:144)
at org.apache.spark.scheduler.Task.run(Task.scala:117)
at
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$9(Executor.scala:657)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1581)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:660)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
...........
Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: Conversion failed
when converting date and/or time from character string.{code}
which is expted because, because the query desing by spark is,
{code:java}
21/01/13 11:09:37 INFO JDBCRelation: Number of partitions: 10, WHERE clauses of
these partitions: "start_time" < '2000-01-01 03:25:11.5461' or "start_time" is
null, "start_time" >= '2000-01-01 03:25:11.5461' AND "start_time" < '2000-01-01
05:49:11.5462', "start_time" >= '2000-01-01 05:49:11.5462' AND "start_time" <
'2000-01-01 08:13:11.5463', "start_time" >= '2000-01-01 08:13:11.5463' AND
"start_time" < '2000-01-01 10:37:11.5464', "start_time" >= '2000-01-01
10:37:11.5464' AND "start_time" < '2000-01-01 13:01:11.5465', "start_time" >=
'2000-01-01 13:01:11.5465' AND "start_time" < '2000-01-01 15:25:11.5466',
"start_time" >= '2000-01-01 15:25:11.5466' AND "start_time" < '2000-01-01
17:49:11.5467', "start_time" >= '2000-01-01 17:49:11.5467' AND "start_time" <
'2000-01-01 20:13:11.5468', "start_time" >= '2000-01-01 20:13:11.5468' AND
"start_time" < '2000-01-01 22:37:11.5469', "start_time" >= '2000-01-01
22:37:11.5469'
{code}
so, the date use in query is '2000-01-01 22:37:11.*5469*' but datetime accept
only three digit for fraction - YYYY-MM-DDThh:mm:ss[.mmm] [datetime
doc|https://docs.microsoft.com/en-us/sql/t-sql/data-types/datetime-transact-sql?view=sql-server-ver15]
was:
I'm trying to do JDBC parallel read with datetime column as partition column
{code:java}
create table eData (eid int, start_time datetime) -- sql server v12.0{code}
{code:java}
val df = spark // spark 3.0.1
.read
.format("jdbc")
.option("url", "jdbc:sqlserver://...")
.option("partitionColumn", "start_time")
.option("lowerBound", "2000-01-01T01:01:11.546")
.option("upperBound", "2000-01-02T01:01:11.547")
.option("numPartitions", "10")
.option("dbtable", "eData")
.load();
df.show(false){code}
and getting this error,
{code:java}
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in
stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID
7, 10.139.64.6, executor 0): com.microsoft.sqlserver.jdbc.SQLServerException:
Conversion failed when converting date and/or time from character string.
at
com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:262)
at
com.microsoft.sqlserver.jdbc.SQLServerResultSet$FetchBuffer.nextRow(SQLServerResultSet.java:5435)
at
com.microsoft.sqlserver.jdbc.SQLServerResultSet.fetchBufferNext(SQLServerResultSet.java:1770)
at
com.microsoft.sqlserver.jdbc.SQLServerResultSet.next(SQLServerResultSet.java:1028)
at
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:357)
at
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:343)
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
at
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31)
at
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
Source)
at
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at
org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:731)
at
org.apache.spark.sql.execution.collect.UnsafeRowBatchUtils$.encodeUnsafeRows(UnsafeRowBatchUtils.scala:80)
at
org.apache.spark.sql.execution.collect.Collector.$anonfun$processFunc$1(Collector.scala:187)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.doRunTask(Task.scala:144)
at org.apache.spark.scheduler.Task.run(Task.scala:117)
at
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$9(Executor.scala:657)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1581)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:660)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
...........
Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: Conversion failed
when converting date and/or time from character string.{code}
which is expted because, because the query desing by spark is,
{code:java}
21/01/13 11:09:37 INFO JDBCRelation: Number of partitions: 10, WHERE clauses of
these partitions: "start_time" < '2000-01-01 03:25:11.5461' or "start_time" is
null, "start_time" >= '2000-01-01 03:25:11.5461' AND "start_time" < '2000-01-01
05:49:11.5462', "start_time" >= '2000-01-01 05:49:11.5462' AND "start_time" <
'2000-01-01 08:13:11.5463', "start_time" >= '2000-01-01 08:13:11.5463' AND
"start_time" < '2000-01-01 10:37:11.5464', "start_time" >= '2000-01-01
10:37:11.5464' AND "start_time" < '2000-01-01 13:01:11.5465', "start_time" >=
'2000-01-01 13:01:11.5465' AND "start_time" < '2000-01-01 15:25:11.5466',
"start_time" >= '2000-01-01 15:25:11.5466' AND "start_time" < '2000-01-01
17:49:11.5467', "start_time" >= '2000-01-01 17:49:11.5467' AND "start_time" <
'2000-01-01 20:13:11.5468', "start_time" >= '2000-01-01 20:13:11.5468' AND
"start_time" < '2000-01-01 22:37:11.5469', "start_time" >= '2000-01-01
22:37:11.5469'
{code}
so, the date use in query is '2000-01-01 22:37:11.*5469*' but datetime accept
only three digit for fraction - YYYY-MM-DDThh:mm:ss[.mmm] [datetime
doc|https://docs.microsoft.com/en-us/sql/t-sql/data-types/datetime-transact-sql?view=sql-server-ver15]
> overflow for datetime datatype when creating stride + JDBC parallel read
> -------------------------------------------------------------------------
>
> Key: SPARK-34097
> URL: https://issues.apache.org/jira/browse/SPARK-34097
> Project: Spark
> Issue Type: Bug
> Components: Spark Core, SQL
> Affects Versions: 3.0.1
> Environment: spark 3.0.1
> sql server v12.0
> Reporter: Pradip Sodha
> Priority: Major
>
> I'm trying to do JDBC parallel read with datetime column as partition column
> {code:java}
> create table eData (eid int, start_time datetime) -- sql server v12.0
> --inserting some data{code}
>
> {code:java}
> val df = spark // spark 3.0.1
> .read
> .format("jdbc")
> .option("url", "jdbc:sqlserver://...")
> .option("partitionColumn", "start_time")
> .option("lowerBound", "2000-01-01T01:01:11.546")
> .option("upperBound", "2000-01-02T01:01:11.547")
> .option("numPartitions", "10")
> .option("dbtable", "eData")
> .load();
> df.show(false){code}
> and getting this error,
> {code:java}
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in
> stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0
> (TID 7, 10.139.64.6, executor 0):
> com.microsoft.sqlserver.jdbc.SQLServerException: Conversion failed when
> converting date and/or time from character string. at
> com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:262)
> at
> com.microsoft.sqlserver.jdbc.SQLServerResultSet$FetchBuffer.nextRow(SQLServerResultSet.java:5435)
> at
> com.microsoft.sqlserver.jdbc.SQLServerResultSet.fetchBufferNext(SQLServerResultSet.java:1770)
> at
> com.microsoft.sqlserver.jdbc.SQLServerResultSet.next(SQLServerResultSet.java:1028)
> at
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:357)
> at
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:343)
> at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
> at
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
> at
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31)
> at
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
> Source)
> at
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
> at
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:731)
> at
> org.apache.spark.sql.execution.collect.UnsafeRowBatchUtils$.encodeUnsafeRows(UnsafeRowBatchUtils.scala:80)
> at
> org.apache.spark.sql.execution.collect.Collector.$anonfun$processFunc$1(Collector.scala:187)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
> at org.apache.spark.scheduler.Task.doRunTask(Task.scala:144)
> at org.apache.spark.scheduler.Task.run(Task.scala:117)
> at
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$9(Executor.scala:657)
> at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1581)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:660)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> ...........
> Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: Conversion failed
> when converting date and/or time from character string.{code}
>
> which is expted because, because the query desing by spark is,
> {code:java}
> 21/01/13 11:09:37 INFO JDBCRelation: Number of partitions: 10, WHERE clauses
> of these partitions: "start_time" < '2000-01-01 03:25:11.5461' or
> "start_time" is null, "start_time" >= '2000-01-01 03:25:11.5461' AND
> "start_time" < '2000-01-01 05:49:11.5462', "start_time" >= '2000-01-01
> 05:49:11.5462' AND "start_time" < '2000-01-01 08:13:11.5463', "start_time" >=
> '2000-01-01 08:13:11.5463' AND "start_time" < '2000-01-01 10:37:11.5464',
> "start_time" >= '2000-01-01 10:37:11.5464' AND "start_time" < '2000-01-01
> 13:01:11.5465', "start_time" >= '2000-01-01 13:01:11.5465' AND "start_time" <
> '2000-01-01 15:25:11.5466', "start_time" >= '2000-01-01 15:25:11.5466' AND
> "start_time" < '2000-01-01 17:49:11.5467', "start_time" >= '2000-01-01
> 17:49:11.5467' AND "start_time" < '2000-01-01 20:13:11.5468', "start_time" >=
> '2000-01-01 20:13:11.5468' AND "start_time" < '2000-01-01 22:37:11.5469',
> "start_time" >= '2000-01-01 22:37:11.5469'
> {code}
> so, the date use in query is '2000-01-01 22:37:11.*5469*' but datetime accept
> only three digit for fraction - YYYY-MM-DDThh:mm:ss[.mmm] [datetime
> doc|https://docs.microsoft.com/en-us/sql/t-sql/data-types/datetime-transact-sql?view=sql-server-ver15]
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]