[
https://issues.apache.org/jira/browse/SPARK-18593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dongjoon Hyun updated SPARK-18593:
----------------------------------
Description:
In Apache Spark 1.6.x, JDBCRDD returns incorrect results for a query with
filters on CHAR column with PostgreSQL CHAR type. The root cause is PostgreSQL
returns `space padded string` for a result. So, the post processing filter
`Filter (a#0 = A)` is evaluated false. Spark 2.0.0 removes the post filter
because it is already handled in the database by `PushedFilters:
[EqualTo(a,A)]`.
{code}
scala> val t_char = sqlContext.read.option("user",
"postgres").option("password",
"rootpass").jdbc("jdbc:postgresql://localhost:5432/postgres", "t_char", new
java.util.Properties())
t_char: org.apache.spark.sql.DataFrame = [a: string]
scala> val t_varchar = sqlContext.read.option("user",
"postgres").option("password",
"rootpass").jdbc("jdbc:postgresql://localhost:5432/postgres", "t_varchar", new
java.util.Properties())
t_varchar: org.apache.spark.sql.DataFrame = [a: string]
scala> t_char.show
+----------+
| a|
+----------+
|A |
|AA |
|AAA |
+----------+
scala> t_varchar.show
+---+
| a|
+---+
| A|
| AA|
|AAA|
+---+
scala> t_char.filter(t_char("a")==="A").show
+---+
| a|
+---+
+---+
scala> t_char.filter(t_char("a")==="A ").show
+----------+
| a|
+----------+
|A |
+----------+
scala> t_varchar.filter(t_varchar("a")==="A").show
+---+
| a|
+---+
| A|
+---+
scala> t_char.filter(t_char("a")==="A").explain
== Physical Plan ==
Filter (a#0 = A)
+- Scan
JDBCRelation(jdbc:postgresql://localhost:5432/postgres,t_char,[Lorg.apache.spark.Partition;@2f65c341,{user=postgres,
password=rootpass})[a#0] PushedFilters: [EqualTo(a,A)]
{code}
was:
when using filter using jdbc.. unable to retrieve data if the column value is
having only value of length one.
for example:
Below code works because the filter condition data is of length > 1
val dataFromTable =
sqlContext.read.jdbc("jdbc:postgresql://localhost:5433/postgres", "stocknames",
defaultProps)
dataFromTable.filter(dataFromTable("grouptype")==="ZP").show()
result:
+------------+----------+--------------------+------+---------+---------+------------+--------------------+----------+
|securitycode|securityid| securityname|status|grouptype|facevalue|
isinno| industry|instrument|
+------------+----------+--------------------+------+---------+---------+------------+--------------------+----------+
| 504370| ZARDIINV|ARDI INVESTMENT &...|Active| ZP| 10.0|
|Misc.Commercial S...| Equity|
| 512221| TARCF|TARRIF CINE &...|Active| ZP|
10.0|INE177E01010|Finance (includin...| Equity|
| 512505| MEENST|MEENAKSHI STEEL I...|Active| ZP| 10.0|
|Iron & Steel ...| Equity|
| 514060| EVERTEX|Evergreen Textile...|Active| ZP|
10.0|INE229N01010| Textiles| Equity|
| 526085| SGARRES|SAGAR TOURIST RES...|Active| ZP| 10.0|NA
| Hotels| Equity|
+------------+----------+--------------------+------+---------+---------+------------+--------------------+----------+
But if the same filter condition code is of length 1, no data is filtered.
eg:
val dataFromTable =
sqlContext.read.jdbc("jdbc:postgresql://localhost:5433/postgres", "stocknames",
defaultProps)
dataFromTable.filter(dataFromTable("grouptype")==="A").show()
not getting any output
+------------+----------+------------+------+---------+---------+------+--------+----------+
|securitycode|securityid|securityname|status|grouptype|facevalue|isinno|industry|instrument|
+------------+----------+------------+------+---------+---------+------+--------+----------+
+------------+----------+------------+------+---------+---------+------+--------+----------+
> JDBCRDD fails to filter CHAR type column for PostgreSQL
> -------------------------------------------------------
>
> Key: SPARK-18593
> URL: https://issues.apache.org/jira/browse/SPARK-18593
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.6.2, 1.6.3
> Reporter: Durga Prasad Gunturu
> Priority: Minor
>
> In Apache Spark 1.6.x, JDBCRDD returns incorrect results for a query with
> filters on CHAR column with PostgreSQL CHAR type. The root cause is
> PostgreSQL returns `space padded string` for a result. So, the post
> processing filter `Filter (a#0 = A)` is evaluated false. Spark 2.0.0 removes
> the post filter because it is already handled in the database by
> `PushedFilters: [EqualTo(a,A)]`.
> {code}
> scala> val t_char = sqlContext.read.option("user",
> "postgres").option("password",
> "rootpass").jdbc("jdbc:postgresql://localhost:5432/postgres", "t_char", new
> java.util.Properties())
> t_char: org.apache.spark.sql.DataFrame = [a: string]
> scala> val t_varchar = sqlContext.read.option("user",
> "postgres").option("password",
> "rootpass").jdbc("jdbc:postgresql://localhost:5432/postgres", "t_varchar",
> new java.util.Properties())
> t_varchar: org.apache.spark.sql.DataFrame = [a: string]
> scala> t_char.show
> +----------+
> | a|
> +----------+
> |A |
> |AA |
> |AAA |
> +----------+
> scala> t_varchar.show
> +---+
> | a|
> +---+
> | A|
> | AA|
> |AAA|
> +---+
> scala> t_char.filter(t_char("a")==="A").show
> +---+
> | a|
> +---+
> +---+
> scala> t_char.filter(t_char("a")==="A ").show
> +----------+
> | a|
> +----------+
> |A |
> +----------+
> scala> t_varchar.filter(t_varchar("a")==="A").show
> +---+
> | a|
> +---+
> | A|
> +---+
> scala> t_char.filter(t_char("a")==="A").explain
> == Physical Plan ==
> Filter (a#0 = A)
> +- Scan
> JDBCRelation(jdbc:postgresql://localhost:5432/postgres,t_char,[Lorg.apache.spark.Partition;@2f65c341,{user=postgres,
> password=rootpass})[a#0] PushedFilters: [EqualTo(a,A)]
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]