[
https://issues.apache.org/jira/browse/SPARK-32347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ihor Bobak updated SPARK-32347:
-------------------------------
Attachment: 2020-07-17 17_46_32-Window.png
2020-07-17 17_49_27-Window.png
2020-07-17 17_52_51-Window.png
> BROADCAST hint makes a weird message that "column can't be resolved" (it was
> OK in Spark 2.4)
> ---------------------------------------------------------------------------------------------
>
> Key: SPARK-32347
> URL: https://issues.apache.org/jira/browse/SPARK-32347
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 3.0.0
> Environment: Spark 3.0.0, jupyter notebook, spark launched in
> local[4] mode, but with Standalone cluster it also fails the same way.
>
>
> Reporter: Ihor Bobak
> Priority: Major
> Fix For: 3.0.1
>
> Attachments: 2020-07-17 17_46_32-Window.png, 2020-07-17
> 17_49_27-Window.png, 2020-07-17 17_52_51-Window.png
>
>
> The bug is very easily reproduced: run the following same code in Spark
> 2.4.3. and in 3.0.0.
> The SQL parser will raise an invalid error message, although everything seems
> to be OK with the SQL statement.
> {code:python}
> import pandas as pd
> pdf_sales = pd.DataFrame([(1, 10), (2, 20)], columns=["BuyerID", "Qty"])
> pdf_buyers = pd.DataFrame([(1, "John"), (2, "Jack")], columns=["BuyerID",
> "BuyerName"])
> df_sales = spark.createDataFrame(pdf_sales)
> df_buyers = spark.createDataFrame(pdf_buyers)
> df_sales.createOrReplaceTempView("df_sales")
> df_buyers.createOrReplaceTempView("df_buyers")
> spark.sql("""
> with b as (
> select /*+ BROADCAST(df_buyers) */
> BuyerID, BuyerName
> from df_buyers
> )
> select
> b.BuyerID,
> b.BuyerName,
> s.Qty
> from df_sales s
> inner join b on s.BuyerID = b.BuyerID
> """).toPandas()
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]