[ 
https://issues.apache.org/jira/browse/KYLIN-5742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17832155#comment-17832155
 ] 

ASF subversion and git services commented on KYLIN-5742:
--------------------------------------------------------

Commit c396134127b17e741e6ead1197589afe7bb773d7 in kylin's branch 
refs/heads/kylin5 from fengguangyuan
[ https://gitbox.apache.org/repos/asf?p=kylin.git;h=c396134127 ]

KYLIN-5742 Make the query result of duplicate group sets same as Spark

Co-authored-by: Guangyuan Feng <guangyuan.f...@kyligence.io>


> When the "Group by" group has duplicate values, the result of Grouping Set 
> query is inconsistent with that in SparkSQL
> ----------------------------------------------------------------------------------------------------------------------
>
>                 Key: KYLIN-5742
>                 URL: https://issues.apache.org/jira/browse/KYLIN-5742
>             Project: Kylin
>          Issue Type: Bug
>    Affects Versions: 5.0-beta
>            Reporter: zhong.zhu
>            Assignee: zhong.zhu
>            Priority: Major
>             Fix For: 5.0.0
>
>         Attachments: image-2023-12-11-14-54-38-652.png, 
> image-2023-12-11-14-55-46-222.png, image-2023-12-11-14-57-32-037.png, 
> image-2023-12-11-14-57-56-771.png
>
>
> {code:sql}
> -- sql1
> select C_NAME,C_CITY,C_NATION,C_REGION,count(*)
> FROM SSB.LINEORDER as LINEORDER
> INNER JOIN SSB.CUSTOMER as CUSTOMER
> ON LINEORDER.LO_CUSTKEY = CUSTOMER.C_CUSTKEY
> where C_NATION = 'CHINA' and C_CITY = 'CHINA    0'
> group by 
> GROUPING SETS ((),(C_NAME,C_CITY),(C_NATION,C_REGION))
> order by C_NAME;
> -- sql2
> select C_NAME,C_CITY,C_NATION,C_REGION,count(*)
> FROM SSB.LINEORDER as LINEORDER
> INNER JOIN SSB.CUSTOMER as CUSTOMER
> ON LINEORDER.LO_CUSTKEY = CUSTOMER.C_CUSTKEY
> where C_NATION = 'CHINA' and C_CITY = 'CHINA    0'
> group by 
> C_NAME,C_CITY,C_NATION,C_REGION,
> GROUPING SETS ((),(C_NAME,C_CITY),(C_NATION,C_REGION))
> order by C_NAME;
> -- sql3
> select C_NAME,C_CITY,C_NATION,C_REGION,count(*)
> FROM SSB.LINEORDER as LINEORDER
> INNER JOIN SSB.CUSTOMER as CUSTOMER
> ON LINEORDER.LO_CUSTKEY = CUSTOMER.C_CUSTKEY
> where C_NATION = 'CHINA' and C_CITY = 'CHINA    0'
> group by 
> C_NAME,C_CITY,C_NATION,C_REGION
> GROUPING SETS ((),(C_NAME,C_CITY),(C_NATION,C_REGION))
> order by C_NAME
> {code}
> In spark-sql, sql1 and sql3 query results are consistent as follows:
>  !image-2023-12-11-14-54-38-652.png! 
> In spark-sql, sql 2 the query results are as follows.
>  !image-2023-12-11-14-55-46-222.png! 
> In KYLIN, the query result of sql1 is as follows, which is consistent with 
> the result of spark-sql sql sql1 sql2:
>  !image-2023-12-11-14-57-32-037.png! 
> The query result of sql2 is as follows, which is inconsistent with the 
> spark-sql sql2 result:
>  !image-2023-12-11-14-57-56-771.png! 
> The syntax of sql3 is not supported
> Hive does not support commas before grouping sets, that is, sql2 is not 
> supported, and the query results of sql1 and sql3 are consistent with 
> spark-sql



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to