Danke Liu created SPARK-47542:
---------------------------------
Summary: spark cannot hit oracle's index when column type is DATE
Key: SPARK-47542
URL: https://issues.apache.org/jira/browse/SPARK-47542
Project: Spark
Issue Type: Bug
Components: Spark Core
Affects Versions: 3.2.4
Reporter: Danke Liu
When I use spark's jdbc to pull data from oracle, it will not hit the index if
the pushed filter's type in oralce is DATE.
Here is my scenario:
first I created a dataframe that read from oracle:
val df = spark.read.format("jdbc").
option("url", url).
option("driver", driver).
option("user", user).
option("password", passwd).
option("dbtable", "select * from foobar.tbl1")
.load()
then I pushed a filter to the dataframe like this:
df.filter("""`update_time` >= to_date('2024-03-12 06:18:17', 'yyyy-MM-dd
HH:mm:ss') """).count()
this will not hit the index on update_time column.
Reason:
the update_time column in oracle is DATE type, this mapped to spark has became
Timestamp(because precision of DATE in oracle is second), and when I pushed a
filter to oracle, it will hit the codes bellow:
// class is org.apache.spark.sql.jdbc.OracleDialect
override def compileValue(value: Any): Any = value match {
// The JDBC drivers support date literals in SQL statements written in the
// format: \{d 'yyyy-mm-dd'} and timestamp literals in SQL statements
written
// in the format: \{ts 'yyyy-mm-dd hh:mm:ss.f...'}. For details, see
// 'Oracle Database JDBC Developer’s Guide and Reference, 11g Release 1
(11.1)'
// Appendix A Reference Information.
case stringValue: String => s"'${escapeSql(stringValue)}'"
case timestampValue: Timestamp => "\{ts '" + timestampValue + "'}"
case dateValue: Date => "\{d '" + dateValue + "'}"
case arrayValue: Array[Any] => arrayValue.map(compileValue).mkString(", ")
case _ => value
}
and this "update_time >= \{ts '2024-03-12 06:18:17'}" will never hit the index.
In my case, as a work around, I just change the code to this:
{color:#cc7832}case {color}timestampValue: Timestamp
=>{color:#6a8759}s"{color}{color:#6a8759}to_date({color}{{color:#9876aa}dateFormat{color}.format(timestampValue)}{color:#6a8759},'yyyy-MM-dd
HH:mi:ss'){color}{color:#6a8759}"{color}
then it worked well.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]