[jira] [Updated] (SPARK-18593) JDBCRDD fails to filter CHAR type column for PostgreSQL

Dongjoon Hyun (JIRA) Sun, 27 Nov 2016 02:06:08 -0800

     [ 
https://issues.apache.org/jira/browse/SPARK-18593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Dongjoon Hyun updated SPARK-18593:
----------------------------------
    Description: 
In Apache Spark 1.6.x, JDBCRDD returns incorrect results for a query with 
filters on CHAR column with PostgreSQL CHAR type. The root cause is PostgreSQL 
returns `space padded string` for a result. So, the post processing filter 
`Filter (a#0 = A)` is evaluated false. Spark 2.0.0 removes the post filter 
because it is already handled in the database by `PushedFilters: 
[EqualTo(a,A)]`.

{code}
scala> val t_char = sqlContext.read.option("user", 
"postgres").option("password", 
"rootpass").jdbc("jdbc:postgresql://localhost:5432/postgres", "t_char", new 
java.util.Properties())
t_char: org.apache.spark.sql.DataFrame = [a: string]

scala> val t_varchar = sqlContext.read.option("user", 
"postgres").option("password", 
"rootpass").jdbc("jdbc:postgresql://localhost:5432/postgres", "t_varchar", new 
java.util.Properties())
t_varchar: org.apache.spark.sql.DataFrame = [a: string]

scala> t_char.show
+----------+
|         a|
+----------+
|A         |
|AA        |
|AAA       |
+----------+


scala> t_varchar.show
+---+
|  a|
+---+
|  A|
| AA|
|AAA|
+---+


scala> t_char.filter(t_char("a")==="A").show
+---+
|  a|
+---+
+---+


scala> t_char.filter(t_char("a")==="A         ").show
+----------+
|         a|
+----------+
|A         |
+----------+


scala> t_varchar.filter(t_varchar("a")==="A").show
+---+
|  a|
+---+
|  A|
+---+


scala> t_char.filter(t_char("a")==="A").explain
== Physical Plan ==
Filter (a#0 = A)
+- Scan 
JDBCRelation(jdbc:postgresql://localhost:5432/postgres,t_char,[Lorg.apache.spark.Partition;@2f65c341,{user=postgres,
 password=rootpass})[a#0] PushedFilters: [EqualTo(a,A)]
{code}

  was:
when using filter using jdbc.. unable to retrieve data if the column value is 
having only value of length one. 

for example:  
Below code works because the filter condition data is of length > 1
val dataFromTable = 
sqlContext.read.jdbc("jdbc:postgresql://localhost:5433/postgres", "stocknames", 
defaultProps)
  dataFromTable.filter(dataFromTable("grouptype")==="ZP").show()

result: 

+------------+----------+--------------------+------+---------+---------+------------+--------------------+----------+
|securitycode|securityid|        securityname|status|grouptype|facevalue|      
isinno|            industry|instrument|
+------------+----------+--------------------+------+---------+---------+------------+--------------------+----------+
|      504370|  ZARDIINV|ARDI INVESTMENT &...|Active|       ZP|     10.0|       
     |Misc.Commercial S...|    Equity|
|      512221|     TARCF|TARRIF CINE &amp;...|Active|       ZP|     
10.0|INE177E01010|Finance (includin...|    Equity|
|      512505|    MEENST|MEENAKSHI STEEL I...|Active|       ZP|     10.0|       
     |Iron &amp; Steel ...|    Equity|
|      514060|   EVERTEX|Evergreen Textile...|Active|       ZP|     
10.0|INE229N01010|            Textiles|    Equity|
|      526085|   SGARRES|SAGAR TOURIST RES...|Active|       ZP|     10.0|NA     
     |              Hotels|    Equity|
+------------+----------+--------------------+------+---------+---------+------------+--------------------+----------+

But if the same filter condition code is of length 1, no data is filtered. 

eg: 

 val dataFromTable = 
sqlContext.read.jdbc("jdbc:postgresql://localhost:5433/postgres", "stocknames", 
defaultProps)
  dataFromTable.filter(dataFromTable("grouptype")==="A").show()

not getting any output

+------------+----------+------------+------+---------+---------+------+--------+----------+
|securitycode|securityid|securityname|status|grouptype|facevalue|isinno|industry|instrument|
+------------+----------+------------+------+---------+---------+------+--------+----------+
+------------+----------+------------+------+---------+---------+------+--------+----------+


> JDBCRDD fails to filter CHAR type column for PostgreSQL
> -------------------------------------------------------
>
>                 Key: SPARK-18593
>                 URL: https://issues.apache.org/jira/browse/SPARK-18593
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.6.2, 1.6.3
>            Reporter: Durga Prasad Gunturu
>            Priority: Minor
>
> In Apache Spark 1.6.x, JDBCRDD returns incorrect results for a query with 
> filters on CHAR column with PostgreSQL CHAR type. The root cause is 
> PostgreSQL returns `space padded string` for a result. So, the post 
> processing filter `Filter (a#0 = A)` is evaluated false. Spark 2.0.0 removes 
> the post filter because it is already handled in the database by 
> `PushedFilters: [EqualTo(a,A)]`.
> {code}
> scala> val t_char = sqlContext.read.option("user", 
> "postgres").option("password", 
> "rootpass").jdbc("jdbc:postgresql://localhost:5432/postgres", "t_char", new 
> java.util.Properties())
> t_char: org.apache.spark.sql.DataFrame = [a: string]
> scala> val t_varchar = sqlContext.read.option("user", 
> "postgres").option("password", 
> "rootpass").jdbc("jdbc:postgresql://localhost:5432/postgres", "t_varchar", 
> new java.util.Properties())
> t_varchar: org.apache.spark.sql.DataFrame = [a: string]
> scala> t_char.show
> +----------+
> |         a|
> +----------+
> |A         |
> |AA        |
> |AAA       |
> +----------+
> scala> t_varchar.show
> +---+
> |  a|
> +---+
> |  A|
> | AA|
> |AAA|
> +---+
> scala> t_char.filter(t_char("a")==="A").show
> +---+
> |  a|
> +---+
> +---+
> scala> t_char.filter(t_char("a")==="A         ").show
> +----------+
> |         a|
> +----------+
> |A         |
> +----------+
> scala> t_varchar.filter(t_varchar("a")==="A").show
> +---+
> |  a|
> +---+
> |  A|
> +---+
> scala> t_char.filter(t_char("a")==="A").explain
> == Physical Plan ==
> Filter (a#0 = A)
> +- Scan 
> JDBCRelation(jdbc:postgresql://localhost:5432/postgres,t_char,[Lorg.apache.spark.Partition;@2f65c341,{user=postgres,
>  password=rootpass})[a#0] PushedFilters: [EqualTo(a,A)]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-18593) JDBCRDD fails to filter CHAR type column for PostgreSQL

Reply via email to