[
https://issues.apache.org/jira/browse/SPARK-12586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Josh Rosen updated SPARK-12586:
-------------------------------
Description:
The following sequence of sql(), registerTempTable() calls gets the wrong
answer.
The correct answer is returned if the temp table is rewritten?
{code}
sql_text = """select row, col, foo, bar, value2 value
from (select row, col, foo, bar, 8 value2 from t0 where row=1
and col=2) s1
union select row, col, foo, bar, value from t0 where not
(row=1 and col=2)"""
df2 = sqlContext.sql(sql_text)
df2.registerTempTable("t1")
# # The following 2 line workaround fixes the problem somehow?
# df3 = sqlContext.createDataFrame(df2.collect())
# df3.registerTempTable("t1")
# # The following 4 line workaround fixes the problem too..but takes way
longer
# filename = "t1.json"
# df2.write.json(filename, mode='overwrite')
# df3 = sqlContext.read.json(filename)
# df3.registerTempTable("t1")
sql_text2 = """select row, col, v1 value from
(select v1 from
(select v_value v1 from values) s1
left join
(select value v2,foo,bar,row,col from t1
where foo=1
and bar=2 and value is not null) s2
on v1=v2 where v2 is null
) sa join
(select row, col from t1 where foo=1
and bar=2 and value is null) sb"""
result = sqlContext.sql(sql_text2)
result.show()
# Expected result
# +---+---+-----+
# |row|col|value|
# +---+---+-----+
# | 3| 4| 1|
# | 3| 4| 2|
# | 3| 4| 3|
# | 3| 4| 4|
# +---+---+-----+
# Getting this wrong result...when not using the workarounds above
# +---+---+-----+
# |row|col|value|
# +---+---+-----+
# +---+---+-----+
{code}
was:
The following sequence of sql(), registerTempTable() calls gets the wrong
answer.
The correct answer is returned if the temp table is rewritten?
sql_text = """select row, col, foo, bar, value2 value
from (select row, col, foo, bar, 8 value2 from t0 where row=1
and col=2) s1
union select row, col, foo, bar, value from t0 where not
(row=1 and col=2)"""
df2 = sqlContext.sql(sql_text)
df2.registerTempTable("t1")
# # The following 2 line workaround fixes the problem somehow?
# df3 = sqlContext.createDataFrame(df2.collect())
# df3.registerTempTable("t1")
# # The following 4 line workaround fixes the problem too..but takes way
longer
# filename = "t1.json"
# df2.write.json(filename, mode='overwrite')
# df3 = sqlContext.read.json(filename)
# df3.registerTempTable("t1")
sql_text2 = """select row, col, v1 value from
(select v1 from
(select v_value v1 from values) s1
left join
(select value v2,foo,bar,row,col from t1
where foo=1
and bar=2 and value is not null) s2
on v1=v2 where v2 is null
) sa join
(select row, col from t1 where foo=1
and bar=2 and value is null) sb"""
result = sqlContext.sql(sql_text2)
result.show()
# Expected result
# +---+---+-----+
# |row|col|value|
# +---+---+-----+
# | 3| 4| 1|
# | 3| 4| 2|
# | 3| 4| 3|
# | 3| 4| 4|
# +---+---+-----+
# Getting this wrong result...when not using the workarounds above
# +---+---+-----+
# |row|col|value|
# +---+---+-----+
# +---+---+-----+
> Wrong answer with registerTempTable and union sql query
> -------------------------------------------------------
>
> Key: SPARK-12586
> URL: https://issues.apache.org/jira/browse/SPARK-12586
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.5.2
> Environment: Windows 7
> Reporter: shao lo
> Attachments: sql_bug.py
>
>
> The following sequence of sql(), registerTempTable() calls gets the wrong
> answer.
> The correct answer is returned if the temp table is rewritten?
> {code}
> sql_text = """select row, col, foo, bar, value2 value
> from (select row, col, foo, bar, 8 value2 from t0 where row=1
> and col=2) s1
> union select row, col, foo, bar, value from t0 where
> not (row=1 and col=2)"""
> df2 = sqlContext.sql(sql_text)
> df2.registerTempTable("t1")
> # # The following 2 line workaround fixes the problem somehow?
> # df3 = sqlContext.createDataFrame(df2.collect())
> # df3.registerTempTable("t1")
> # # The following 4 line workaround fixes the problem too..but takes way
> longer
> # filename = "t1.json"
> # df2.write.json(filename, mode='overwrite')
> # df3 = sqlContext.read.json(filename)
> # df3.registerTempTable("t1")
> sql_text2 = """select row, col, v1 value from
> (select v1 from
> (select v_value v1 from values) s1
> left join
> (select value v2,foo,bar,row,col from t1
> where foo=1
> and bar=2 and value is not null) s2
> on v1=v2 where v2 is null
> ) sa join
> (select row, col from t1 where foo=1
> and bar=2 and value is null) sb"""
> result = sqlContext.sql(sql_text2)
> result.show()
>
> # Expected result
> # +---+---+-----+
> # |row|col|value|
> # +---+---+-----+
> # | 3| 4| 1|
> # | 3| 4| 2|
> # | 3| 4| 3|
> # | 3| 4| 4|
> # +---+---+-----+
> # Getting this wrong result...when not using the workarounds above
> # +---+---+-----+
> # |row|col|value|
> # +---+---+-----+
> # +---+---+-----+
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]