[ 
https://issues.apache.org/jira/browse/SPARK-27203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16796946#comment-16796946
 ] 

Igor Ngouagna commented on SPARK-27203:
---------------------------------------

Furthermore, I notice something weird when i tried to create the view from 
spark.

If the view code is basic (like the one above), everything works well. That is 
the view is readable from Spark, and from Beeline.

However, if the view code is a little more complex, and the view is *created 
via spark.sql*, reading it from spark.sql is OK, but *reading it from beeline 
fails*!

For example considering the following view created via Spark SQL:

View Creation
{code:sql}
spark.sql("CREATE VIEW IF NOT EXISTS db.test_cte_view AS\
  with cte as (select * from db.test_cte union all select * from 
db.test_cte_2),\
           tmp as (SELECT id, idate, ROW_NUMBER() over(PARTITION BY id ORDER BY 
idate desc ) AS row_num from cte)\
  SELECT cte.* from cte\
        join (SELECT * from tmp where tmp.row_num =1) tmp_2\
        on cte.id = tmp_2.id\
        and cte.idate = tmp_2.idate")
{code}
When you do
{code:sql}
beeline> select * from db.test_cte_view;
{code}
the output is like
{code}
Error: Error while compiling statement: FAILED: SemanticException line 1:330 
Failed to recognize predicate 'UNION'. Failed rule: 'identifier' in subquery 
source in definition of VIEW test_cte_view [
SELECT `gen_attr_0` AS `id`, `gen_attr_1` AS `status`, `gen_attr_2` AS `idate` 
FROM (SELECT `gen_attr_0`, `gen_attr_1`, `gen_attr_2` FROM ((SELECT 
`gen_attr_0`, `gen_attr_1`, `gen_attr_2` FROM (SELECT `id` AS `gen_attr_0`, 
`status` AS `gen_attr_1`, `idate` AS `gen_attr_2` FROM `db`.`test_cte`) AS 
gen_subquery_0) UNION ALL (SELECT `gen_attr_5`, `gen_attr_6`, `gen_attr_7` FROM 
(SELECT `id` AS `gen_attr_5`, `status` AS `gen_attr_6`, `idate` AS `gen_attr_7` 
FROM `db`.`test_cte_2`) AS gen_subquery_1)) AS cte INNER JOIN (SELECT 
`gen_attr_3`, `gen_attr_4`, `gen_attr_8` FROM (SELECT `gen_attr_3`, 
`gen_attr_4`, `gen_attr_8` FROM (SELECT gen_subquery_4.`gen_attr_3`, 
gen_subquery_4.`gen_attr_4`, row_number() OVER (PARTITION BY `gen_attr_3` ORDER 
BY `gen_attr_4` DESC NULLS LAST ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT 
ROW) AS `gen_attr_8` FROM (SELECT `gen_attr_3`, `gen_attr_4` FROM ((SELECT 
`gen_attr_3`, `gen_attr_9`, `gen_attr_4` FROM (SELECT `id` AS `gen_attr_3`, 
`status` AS `gen_attr_9`, `idate` AS `gen_attr_4` FROM `db`.`test_cte`) AS 
gen_subquery_2) UNION ALL (SELECT `gen_attr_5`, `gen_attr_6`, `gen_attr_7` FROM 
(SELECT `id` AS `gen_attr_5`, `status` AS `gen_attr_6`, `idate` AS `gen_attr_7` 
FROM `db`.`test_cte_2`) AS gen_subquery_3)) AS cte) AS gen_subquery_4) AS 
gen_subquery_5) AS tmp WHERE (`gen_attr_8` = 1)) AS tmp_2 ON ((`gen_attr_0` = 
`gen_attr_3`) AND (`gen_attr_2` = `gen_attr_4`))) AS cte
] used as test_cte_view at Line 1:14 (state=42000,code=40000)
{code}
*Tables for test*:
{code:sql}
CREATE TABLE db.test_cte(
id int, 
status string, 
idate date )
  
CREATE TABLE db.test_cte_2(
id int, 
status string, 
idate date )
{code}

> Spark Fails to read a view using CTE (WITH clause) and created via beeline 
> ---------------------------------------------------------------------------
>
>                 Key: SPARK-27203
>                 URL: https://issues.apache.org/jira/browse/SPARK-27203
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.1.1
>            Reporter: Igor Ngouagna
>            Priority: Major
>
> Spark fails when trying to read a view which code involve CTE, and which is 
> created via beeline.
> For example, considering the following view, created via Beeline:
> {code:sql}
> create view db.test as 
> with q1 as (select 1 as n)
> select n from q1
> {code}
> When you do
> {code:java}
> spark.sql("select * from db.test").show()
> {code}
> The output is like
> {code}
> 'Table or view not found: q1; line 2 pos 14'
> Traceback (most recent call last):
>   File 
> "/DATA/fs11/hadoop/yarn/local/usercache/ingouagn/appcache/application_1552973526615_3878/container_e380_1552973526615_3878_01_000001/pyspark.zip/pyspark/sql/session.py",
>  line 545, in sql
>     return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
>   File 
> "/DATA/fs11/hadoop/yarn/local/usercache/ingouagn/appcache/application_1552973526615_3878/container_e380_1552973526615_3878_01_000001/py4j-0.10.4-src.zip/py4j/java_gateway.py",
>  line 1133, in __call__
>     answer, self.gateway_client, self.target_id, self.name)
>   File 
> "/DATA/fs11/hadoop/yarn/local/usercache/ingouagn/appcache/application_1552973526615_3878/container_e380_1552973526615_3878_01_000001/pyspark.zip/pyspark/sql/utils.py",
>  line 69, in deco
>     raise AnalysisException(s.split(': ', 1)[1], stackTrace)
> pyspark.sql.utils.AnalysisException: 'Table or view not found: q1; line 2 pos 
> 14'
> {code}
>  
> *Spark: 2.1.1*
> *Beeline: 1.2.1000*
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to