[
https://issues.apache.org/jira/browse/KYLIN-5203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
wang closed KYLIN-5203.
-----------------------
Resolution: Won't Do
> From Kylin or Hive, the same query Sql, but the results are inconsistent
> ------------------------------------------------------------------------
>
> Key: KYLIN-5203
> URL: https://issues.apache.org/jira/browse/KYLIN-5203
> Project: Kylin
> Issue Type: Bug
> Components: Query Engine
> Affects Versions: v3.1.2
> Reporter: wang
> Priority: Blocker
>
> SQL(SUM, COUNT):
> SELECT
> SUM(t1.a1),
> COUNT(1)
> FROM
> T1 JOIN T2 ON...
> JOIN T3 ON...
> JOIN T4 ON...
> ...
> JOIN T9 ON...
> WHERE
> T1.c1 = '10000'
> T1.date between '2022-06-11' and '2022-06-21'
> {color:#ff0000}T9.b_type IN ('7', '11', '12');{color}
> Result:
> || ||sum||count||
> |Hive|2134980.9451|36330|
> |Kylin|1135892.3346|19765|
> h3. If remove T9 Filter:
> SELECT
> SUM(t1.a1),
> COUNT(1)
> FROM
> T1 JOIN T2 ON...
> JOIN T3 ON...
> JOIN T4 ON...
> ...
> JOIN T9 ON...
> WHERE
> T1.c1 = '10000'
> T1.date between '2022-06-11' and '2022-06-21';
> Result:
> || ||sum||count||
> |Hive|3184089.5551|65333|
> |Kylin|3184089.5551|65333|
> 理论上,Hive和kylin的结果一致,但是不加上T9表的过滤条件,结果一致,加上Filter,结果丢失;
> In theory, the results of Hive and kylin are the same, but the filter
> conditions of the T9 table are not added, the results are the same, and the
> results are lost when Filter is added;
> env:
> Hive,
> 一共九张表,主表Fact Table是分区表,其余八张表中,两个千万大表,剩下的是维表,表类型是分桶表
> There are nine tables. The main table, Fact Table, is a partition table.
> The other eight tables, there are two large tables. The rest are dimension
> tables , bucket tables.
> Kylin:
> Create Intermediate Flat Hive Table
> Redistribute Flat Hive Table
> Extract Fact Table Distinct Columns(Map Input)
> Segment:
> Source Count: ???
> From log, the same data count
>
>
>
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)