[ 
https://issues.apache.org/jira/browse/SPARK-20259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15962404#comment-15962404
 ] 

Hyukjin Kwon commented on SPARK-20259:
--------------------------------------

If so, I guess it is a duplicate of SPARK-12449. I'd close this if this gets 
not updated for a long time like few days a couple of weeks assuming it refers 
pushing down the join.

> Support push down join optimizations in DataFrameReader when loading from JDBC
> ------------------------------------------------------------------------------
>
>                 Key: SPARK-20259
>                 URL: https://issues.apache.org/jira/browse/SPARK-20259
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 1.6.2, 2.1.0
>            Reporter: John Muller
>            Priority: Minor
>
> Given two dataframes loaded from the same JDBC connection:
> {code:title=UnoptimizedJDBCJoin.scala|borderStyle=solid}
> val ordersDF = spark.read
>   .format("jdbc")
>   .option("url", "jdbc:postgresql:dbserver")
>   .option("dbtable", "northwind.orders")
>   .option("user", "username")
>   .option("password", "password")
>   .load().toDS
>   
> val productDF = spark.read
>   .format("jdbc")
>   .option("url", "jdbc:postgresql:dbserver")
>   .option("dbtable", "northwind.product")
>   .option("user", "username")
>   .option("password", "password")
>   .load().toDS
>   
> ordersDF.createOrReplaceTempView("orders")
> productDF.createOrReplaceTempView("product")
> // Followed by a join between them:
> val ordersByProduct = sql("SELECT p.name, SUM(o.qty) AS qty FROM orders AS o 
> INNER JOIN product AS p ON o.product_id = p.product_id GROUP BY p.name")
> {code}
> Catalyst should optimize the query to be:
> SELECT northwind.product.name, SUM(northwind.orders.qty)
> FROM northwind.orders
> INNER JOIN northwind.product ON
>   northwind.orders.product_id = northwind.product.product_id
> GROUP BY p.name



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to