[
https://issues.apache.org/jira/browse/SPARK-34097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17264538#comment-17264538
]
Takeshi Yamamuro commented on SPARK-34097:
------------------------------------------
Ah, I see. Thanks for the report. This issue reminds me of the issue:
https://issues.apache.org/jira/browse/SPARK-28587. Since the timestamp part in
the where clause looks database-depenedent, I'm thinking now that we might need
to handle it in JdbcDialect...
> overflow for datetime datatype when creating stride + JDBC parallel read
> -------------------------------------------------------------------------
>
> Key: SPARK-34097
> URL: https://issues.apache.org/jira/browse/SPARK-34097
> Project: Spark
> Issue Type: Bug
> Components: Spark Core, SQL
> Affects Versions: 3.0.1
> Environment: spark 3.0.1
> sql server v12.0
> Reporter: Pradip Sodha
> Priority: Major
>
> I'm trying to do JDBC parallel read with datetime column as partition column
> {code:java}
> create table eData (eid int, start_time datetime) -- sql server v12.0
> --inserting some data{code}
>
> {code:java}
> val df = spark // spark 3.0.1
> .read
> .format("jdbc")
> .option("url", "jdbc:sqlserver://...")
> .option("partitionColumn", "start_time")
> .option("lowerBound", "2000-01-01T01:01:11.546")
> .option("upperBound", "2000-01-02T01:01:11.547")
> .option("numPartitions", "10")
> .option("dbtable", "eData")
> .load();
> df.show(false){code}
> and getting this error,
> {code:java}
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in
> stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0
> (TID 7, 10.139.64.6, executor 0):
> com.microsoft.sqlserver.jdbc.SQLServerException: Conversion failed when
> converting date and/or time from character string. at
> com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:262)
> at
> com.microsoft.sqlserver.jdbc.SQLServerResultSet$FetchBuffer.nextRow(SQLServerResultSet.java:5435)
> at
> com.microsoft.sqlserver.jdbc.SQLServerResultSet.fetchBufferNext(SQLServerResultSet.java:1770)
> at
> com.microsoft.sqlserver.jdbc.SQLServerResultSet.next(SQLServerResultSet.java:1028)
> at
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:357)
> at
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:343)
> at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
> at
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
> at
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31)
> at
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
> Source)
> at
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
> at
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:731)
> at
> org.apache.spark.sql.execution.collect.UnsafeRowBatchUtils$.encodeUnsafeRows(UnsafeRowBatchUtils.scala:80)
> at
> org.apache.spark.sql.execution.collect.Collector.$anonfun$processFunc$1(Collector.scala:187)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
> at org.apache.spark.scheduler.Task.doRunTask(Task.scala:144)
> at org.apache.spark.scheduler.Task.run(Task.scala:117)
> at
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$9(Executor.scala:657)
> at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1581)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:660)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> ...........
> Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: Conversion failed
> when converting date and/or time from character string.{code}
>
> which is expted because, because the query desing by spark is,
> {code:java}
> 21/01/13 11:09:37 INFO JDBCRelation: Number of partitions: 10, WHERE clauses
> of these partitions: "start_time" < '2000-01-01 03:25:11.5461' or
> "start_time" is null, "start_time" >= '2000-01-01 03:25:11.5461' AND
> "start_time" < '2000-01-01 05:49:11.5462', "start_time" >= '2000-01-01
> 05:49:11.5462' AND "start_time" < '2000-01-01 08:13:11.5463', "start_time" >=
> '2000-01-01 08:13:11.5463' AND "start_time" < '2000-01-01 10:37:11.5464',
> "start_time" >= '2000-01-01 10:37:11.5464' AND "start_time" < '2000-01-01
> 13:01:11.5465', "start_time" >= '2000-01-01 13:01:11.5465' AND "start_time" <
> '2000-01-01 15:25:11.5466', "start_time" >= '2000-01-01 15:25:11.5466' AND
> "start_time" < '2000-01-01 17:49:11.5467', "start_time" >= '2000-01-01
> 17:49:11.5467' AND "start_time" < '2000-01-01 20:13:11.5468', "start_time" >=
> '2000-01-01 20:13:11.5468' AND "start_time" < '2000-01-01 22:37:11.5469',
> "start_time" >= '2000-01-01 22:37:11.5469'
> {code}
> so, the date use in query is '2000-01-01 22:37:11.*5469*' but datetime accept
> only three digit for fraction - YYYY-MM-DDThh:mm:ss[.mmm] [datetime
> doc|https://docs.microsoft.com/en-us/sql/t-sql/data-types/datetime-transact-sql?view=sql-server-ver15]
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]