[jira] [Commented] (KYLIN-5742) When the "Group by" group has duplicate values, the result of Grouping Set query is inconsistent with that in SparkSQL

ASF subversion and git services (Jira) Fri, 29 Mar 2024 03:33:23 -0700


    [ 
https://issues.apache.org/jira/browse/KYLIN-5742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17832155#comment-17832155
 ]


ASF subversion and git services commented on KYLIN-5742:
--------------------------------------------------------

Commit c396134127b17e741e6ead1197589afe7bb773d7 in kylin's branch 
refs/heads/kylin5 from fengguangyuan
[ https://gitbox.apache.org/repos/asf?p=kylin.git;h=c396134127 ]

KYLIN-5742 Make the query result of duplicate group sets same as Spark

Co-authored-by: Guangyuan Feng <guangyuan.f...@kyligence.io>


> When the "Group by" group has duplicate values, the result of Grouping Set 
> query is inconsistent with that in SparkSQL
> ----------------------------------------------------------------------------------------------------------------------
>
>                 Key: KYLIN-5742
>                 URL: https://issues.apache.org/jira/browse/KYLIN-5742
>             Project: Kylin
>          Issue Type: Bug
>    Affects Versions: 5.0-beta
>            Reporter: zhong.zhu
>            Assignee: zhong.zhu
>            Priority: Major
>             Fix For: 5.0.0
>
>         Attachments: image-2023-12-11-14-54-38-652.png, 
> image-2023-12-11-14-55-46-222.png, image-2023-12-11-14-57-32-037.png, 
> image-2023-12-11-14-57-56-771.png
>
>
> {code:sql}
> -- sql1
> select C_NAME,C_CITY,C_NATION,C_REGION,count(*)
> FROM SSB.LINEORDER as LINEORDER
> INNER JOIN SSB.CUSTOMER as CUSTOMER
> ON LINEORDER.LO_CUSTKEY = CUSTOMER.C_CUSTKEY
> where C_NATION = 'CHINA' and C_CITY = 'CHINA    0'
> group by 
> GROUPING SETS ((),(C_NAME,C_CITY),(C_NATION,C_REGION))
> order by C_NAME;
> -- sql2
> select C_NAME,C_CITY,C_NATION,C_REGION,count(*)
> FROM SSB.LINEORDER as LINEORDER
> INNER JOIN SSB.CUSTOMER as CUSTOMER
> ON LINEORDER.LO_CUSTKEY = CUSTOMER.C_CUSTKEY
> where C_NATION = 'CHINA' and C_CITY = 'CHINA    0'
> group by 
> C_NAME,C_CITY,C_NATION,C_REGION,
> GROUPING SETS ((),(C_NAME,C_CITY),(C_NATION,C_REGION))
> order by C_NAME;
> -- sql3
> select C_NAME,C_CITY,C_NATION,C_REGION,count(*)
> FROM SSB.LINEORDER as LINEORDER
> INNER JOIN SSB.CUSTOMER as CUSTOMER
> ON LINEORDER.LO_CUSTKEY = CUSTOMER.C_CUSTKEY
> where C_NATION = 'CHINA' and C_CITY = 'CHINA    0'
> group by 
> C_NAME,C_CITY,C_NATION,C_REGION
> GROUPING SETS ((),(C_NAME,C_CITY),(C_NATION,C_REGION))
> order by C_NAME
> {code}
> In spark-sql, sql1 and sql3 query results are consistent as follows:
>  !image-2023-12-11-14-54-38-652.png! 
> In spark-sql, sql 2 the query results are as follows.
>  !image-2023-12-11-14-55-46-222.png! 
> In KYLIN, the query result of sql1 is as follows, which is consistent with 
> the result of spark-sql sql sql1 sql2:
>  !image-2023-12-11-14-57-32-037.png! 
> The query result of sql2 is as follows, which is inconsistent with the 
> spark-sql sql2 result:
>  !image-2023-12-11-14-57-56-771.png! 
> The syntax of sql3 is not supported
> Hive does not support commas before grouping sets, that is, sql2 is not 
> supported, and the query results of sql1 and sql3 are consistent with 
> spark-sql



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KYLIN-5742) When the "Group by" group has duplicate values, the result of Grouping Set query is inconsistent with that in SparkSQL

Reply via email to