[
https://issues.apache.org/jira/browse/SPARK-18593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15699431#comment-15699431
]
Dongjoon Hyun commented on SPARK-18593:
---------------------------------------
Although this is correctness issue, there is a workaround for this issue. Users
can use TEXT or VARCHAR.
In addition, I'm not sure we will have Apache Spark 1.6.4.
BTW, the only correct way to fix this issue is a backport of related feature.
So, I'll make a PR for this.
> JDBCRDD returns incorrect results for a query with filters on CHAR type
> column of PostgreSQL
> --------------------------------------------------------------------------------------------
>
> Key: SPARK-18593
> URL: https://issues.apache.org/jira/browse/SPARK-18593
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.6.2, 1.6.3
> Reporter: Durga Prasad Gunturu
> Priority: Minor
> Labels: correctness
>
> In Apache Spark 1.6.x, JDBCRDD returns incorrect results for a query with
> filters on CHAR column with PostgreSQL CHAR type. The root cause is
> PostgreSQL returns `space padded string` for a result. So, the post
> processing filter `Filter (a#0 = A)` is evaluated false. Spark 2.0.0 removes
> the post filter because it is already handled in the database by
> `PushedFilters: [EqualTo(a,A)]`.
> {code}
> scala> val t_char = sqlContext.read.option("user",
> "postgres").option("password",
> "rootpass").jdbc("jdbc:postgresql://localhost:5432/postgres", "t_char", new
> java.util.Properties())
> t_char: org.apache.spark.sql.DataFrame = [a: string]
> scala> val t_varchar = sqlContext.read.option("user",
> "postgres").option("password",
> "rootpass").jdbc("jdbc:postgresql://localhost:5432/postgres", "t_varchar",
> new java.util.Properties())
> t_varchar: org.apache.spark.sql.DataFrame = [a: string]
> scala> t_char.show
> +----------+
> | a|
> +----------+
> |A |
> |AA |
> |AAA |
> +----------+
> scala> t_varchar.show
> +---+
> | a|
> +---+
> | A|
> | AA|
> |AAA|
> +---+
> scala> t_char.filter(t_char("a")==="A").show
> +---+
> | a|
> +---+
> +---+
> scala> t_char.filter(t_char("a")==="A ").show
> +----------+
> | a|
> +----------+
> |A |
> +----------+
> scala> t_varchar.filter(t_varchar("a")==="A").show
> +---+
> | a|
> +---+
> | A|
> +---+
> scala> t_char.filter(t_char("a")==="A").explain
> == Physical Plan ==
> Filter (a#0 = A)
> +- Scan
> JDBCRelation(jdbc:postgresql://localhost:5432/postgres,t_char,[Lorg.apache.spark.Partition;@2f65c341,{user=postgres,
> password=rootpass})[a#0] PushedFilters: [EqualTo(a,A)]
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]