[ 
https://issues.apache.org/jira/browse/SPARK-32347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ihor Bobak updated SPARK-32347:
-------------------------------
    Description: 
The bug is very easily reproduced: run the following same code in Spark 2.4.3. 
and in 3.0.0.

The SQL parser will raise an invalid error message with 3.0.0, although 
everything seems to be OK with the SQL statement and it works fine in Spark 
2.4.3
{code:python}
import pandas as pd

pdf_sales = pd.DataFrame([(1, 10), (2, 20)], columns=["BuyerID", "Qty"])
pdf_buyers = pd.DataFrame([(1, "John"), (2, "Jack")], columns=["BuyerID", 
"BuyerName"])

df_sales = spark.createDataFrame(pdf_sales)
df_buyers = spark.createDataFrame(pdf_buyers)

df_sales.createOrReplaceTempView("df_sales")
df_buyers.createOrReplaceTempView("df_buyers")

spark.sql("""
    with b as (
        select /*+ BROADCAST(df_buyers) */
            BuyerID, BuyerName 
        from df_buyers
    )
    select 
        b.BuyerID,
        b.BuyerName,
        s.Qty
    from df_sales s
        inner join b on s.BuyerID =  b.BuyerID
""").toPandas()
{code}

  was:
The bug is very easily reproduced: run the following same code in Spark 2.4.3. 
and in 3.0.0.

The SQL parser will raise an invalid error message, although everything seems 
to be OK with the SQL statement.
{code:python}
import pandas as pd

pdf_sales = pd.DataFrame([(1, 10), (2, 20)], columns=["BuyerID", "Qty"])
pdf_buyers = pd.DataFrame([(1, "John"), (2, "Jack")], columns=["BuyerID", 
"BuyerName"])

df_sales = spark.createDataFrame(pdf_sales)
df_buyers = spark.createDataFrame(pdf_buyers)

df_sales.createOrReplaceTempView("df_sales")
df_buyers.createOrReplaceTempView("df_buyers")

spark.sql("""
    with b as (
        select /*+ BROADCAST(df_buyers) */
            BuyerID, BuyerName 
        from df_buyers
    )
    select 
        b.BuyerID,
        b.BuyerName,
        s.Qty
    from df_sales s
        inner join b on s.BuyerID =  b.BuyerID
""").toPandas()
{code}


> BROADCAST hint makes a weird message that "column can't be resolved" (it was 
> OK in Spark 2.4)
> ---------------------------------------------------------------------------------------------
>
>                 Key: SPARK-32347
>                 URL: https://issues.apache.org/jira/browse/SPARK-32347
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.0.0
>         Environment: Spark 3.0.0, jupyter notebook, spark launched in 
> local[4] mode, but with Standalone cluster it also fails the same way.
>  
>  
>            Reporter: Ihor Bobak
>            Priority: Major
>             Fix For: 3.0.1
>
>         Attachments: 2020-07-17 17_46_32-Window.png, 2020-07-17 
> 17_49_27-Window.png, 2020-07-17 17_52_51-Window.png
>
>
> The bug is very easily reproduced: run the following same code in Spark 
> 2.4.3. and in 3.0.0.
> The SQL parser will raise an invalid error message with 3.0.0, although 
> everything seems to be OK with the SQL statement and it works fine in Spark 
> 2.4.3
> {code:python}
> import pandas as pd
> pdf_sales = pd.DataFrame([(1, 10), (2, 20)], columns=["BuyerID", "Qty"])
> pdf_buyers = pd.DataFrame([(1, "John"), (2, "Jack")], columns=["BuyerID", 
> "BuyerName"])
> df_sales = spark.createDataFrame(pdf_sales)
> df_buyers = spark.createDataFrame(pdf_buyers)
> df_sales.createOrReplaceTempView("df_sales")
> df_buyers.createOrReplaceTempView("df_buyers")
> spark.sql("""
>     with b as (
>         select /*+ BROADCAST(df_buyers) */
>             BuyerID, BuyerName 
>         from df_buyers
>     )
>     select 
>         b.BuyerID,
>         b.BuyerName,
>         s.Qty
>     from df_sales s
>         inner join b on s.BuyerID =  b.BuyerID
> """).toPandas()
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to