[ 
https://issues.apache.org/jira/browse/KYLIN-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

XiaoXiang Yu updated KYLIN-3722:
--------------------------------
    Description: 
+*Kylin limit-pushdown is sometimes cause data reduction.*+

For example:
{quote}select uid, sum(active_minutes) as am
 from useraction
 where item_id in (
     select distinct item_id
     from iteminfo
     where item_type in ('Video')
 ) and act_type != 'share'
 group by uid
 limit 10
{quote}
+*In hive, we got correct result(Five row).*+
{quote}hive>
 > select uid, sum(active_minutes) as am
 > from useraction
 > where item_id in (
 > select distinct item_id
 > from iteminfo
 > where item_type in ('Video')
 > ) and act_type != 'share'
 > group by uid
 > limit 10;
 Query ID = root_20181216170145_d5667a81-46d0-4899-a4bb-7c580155049e
 Total jobs = 1
 Launching Job 1 out of 1

Status: Running (Executing on YARN cluster with App id 
application_1539833412107_0414)

--------------------------------------------------------------------------------
 VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
 
--------------------------------------------------------------------------------
 Map 1 .......... SUCCEEDED 1 1 0 0 0 0
 Map 3 .......... SUCCEEDED 1 1 0 0 0 0
 Reducer 2 ...... SUCCEEDED 1 1 0 0 0 0
 Reducer 4 ...... SUCCEEDED 1 1 0 0 0 0
 
--------------------------------------------------------------------------------
 VERTICES: 04/04 [==========================>>] 100% ELAPSED TIME: 7.67 s
 
--------------------------------------------------------------------------------
 OK
 1 14565.470000000008
 2 64744.89000000003
 3 64939.01999999984
 5 36563.76999999997
 6 36641.64999999999
 Time taken: 11.02 seconds, Fetched: 5 row(s)
{quote}
+*In Kylin, same query got error result(only THREE row). But when you set limit 
to 50000(original value). It is OK.*+

!image-2018-12-16-17-06-16-341.png!

This error is reported by Meituan's Dev.

 

We can find following things in log:
{quote} 

KYLIN [ DEBUG ] 12-16 17:04:28.299 
org.apache.kylin.storage.gtrecord.GTCubeStorageQueryBase.enableStorageLimitIfPossible(GTCubeStorageQueryBase.java:433)
 from Query 78808744-8324-3ad4-58ac-93ad7cd8a708-81
 > storageLimitLevel set to LIMIT_ON_RETURN_SIZE because groupD is not 
 > clustered at head, groupsD: \{0} with cuboid columns: \{1}

KYLIN [ INFO ] 12-16 17:04:28.299 
org.apache.kylin.storage.StorageContext.applyLimitPushDown(StorageContext.java:167)
 from Query 78808744-8324-3ad4-58ac-93ad7cd8a708-81
 > Enabling limit push down: 10 at level: LIMIT_ON_RETURN_SIZE
{quote}
 

  was:
+*Kylin limit-pushdown is sometimes cause data reduction.*+

For example:
{quote}select uid, sum(active_minutes) as am
 from useraction
 where item_id in (
     select distinct item_id
     from iteminfo
     where item_type in ('Video')
 ) and act_type != 'share'
 group by uid
 limit 10
{quote}
+*In hive, we got correct result(Five row).*+
{quote}hive>
 > select uid, sum(active_minutes) as am
 > from useraction
 > where item_id in (
 > select distinct item_id
 > from iteminfo
 > where item_type in ('Video')
 > ) and act_type != 'share'
 > group by uid
 > limit 10;
 Query ID = root_20181216170145_d5667a81-46d0-4899-a4bb-7c580155049e
 Total jobs = 1
 Launching Job 1 out of 1

Status: Running (Executing on YARN cluster with App id 
application_1539833412107_0414)

--------------------------------------------------------------------------------
 VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
 
--------------------------------------------------------------------------------
 Map 1 .......... SUCCEEDED 1 1 0 0 0 0
 Map 3 .......... SUCCEEDED 1 1 0 0 0 0
 Reducer 2 ...... SUCCEEDED 1 1 0 0 0 0
 Reducer 4 ...... SUCCEEDED 1 1 0 0 0 0
 
--------------------------------------------------------------------------------
 VERTICES: 04/04 [==========================>>] 100% ELAPSED TIME: 7.67 s
 
--------------------------------------------------------------------------------
 OK
 1 14565.470000000008
 2 64744.89000000003
 3 64939.01999999984
 5 36563.76999999997
 6 36641.64999999999
 Time taken: 11.02 seconds, Fetched: 5 row(s)
{quote}
+*In Kylin, same query got error result(only THREE row). But when you set limit 
to 50000(original value). It is OK.*+

!image-2018-12-16-17-06-16-341.png!

This error is reported by Meituan's Dev.

 

We can find following things in log:
{quote}KYLIN [ DEBUG ] 12-16 16:56:43.431 
org.apache.kylin.storage.gtrecord.GTCubeStorageQueryBase.enableStorageLimitIfPossible(GTCubeStorageQueryBase.java:433)
 from Query f22de22d-764e-efef-26f3-6bd9f38bc826-83
 > storageLimitLevel set to LIMIT_ON_RETURN_SIZE because groupD is not 
 > clustered at head, groupsD: \{0} with cuboid columns: \{1}

KYLIN [ INFO ] 12-16 16:56:43.431 
org.apache.kylin.storage.StorageContext.applyLimitPushDown(StorageContext.java:167)
 from Query f22de22d-764e-efef-26f3-6bd9f38bc826-83
 > Enabling limit push down: 50000 at level: LIMIT_ON_RETURN_SIZE
{quote}
 


> Error Limit Push Down
> ---------------------
>
>                 Key: KYLIN-3722
>                 URL: https://issues.apache.org/jira/browse/KYLIN-3722
>             Project: Kylin
>          Issue Type: Bug
>          Components: Query Engine
>    Affects Versions: all
>            Reporter: XiaoXiang Yu
>            Assignee: XiaoXiang Yu
>            Priority: Major
>              Labels: LimitPushDown
>             Fix For: v2.6.0
>
>         Attachments: image-2018-12-16-17-06-16-341.png
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> +*Kylin limit-pushdown is sometimes cause data reduction.*+
> For example:
> {quote}select uid, sum(active_minutes) as am
>  from useraction
>  where item_id in (
>      select distinct item_id
>      from iteminfo
>      where item_type in ('Video')
>  ) and act_type != 'share'
>  group by uid
>  limit 10
> {quote}
> +*In hive, we got correct result(Five row).*+
> {quote}hive>
>  > select uid, sum(active_minutes) as am
>  > from useraction
>  > where item_id in (
>  > select distinct item_id
>  > from iteminfo
>  > where item_type in ('Video')
>  > ) and act_type != 'share'
>  > group by uid
>  > limit 10;
>  Query ID = root_20181216170145_d5667a81-46d0-4899-a4bb-7c580155049e
>  Total jobs = 1
>  Launching Job 1 out of 1
> Status: Running (Executing on YARN cluster with App id 
> application_1539833412107_0414)
> --------------------------------------------------------------------------------
>  VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
>  
> --------------------------------------------------------------------------------
>  Map 1 .......... SUCCEEDED 1 1 0 0 0 0
>  Map 3 .......... SUCCEEDED 1 1 0 0 0 0
>  Reducer 2 ...... SUCCEEDED 1 1 0 0 0 0
>  Reducer 4 ...... SUCCEEDED 1 1 0 0 0 0
>  
> --------------------------------------------------------------------------------
>  VERTICES: 04/04 [==========================>>] 100% ELAPSED TIME: 7.67 s
>  
> --------------------------------------------------------------------------------
>  OK
>  1 14565.470000000008
>  2 64744.89000000003
>  3 64939.01999999984
>  5 36563.76999999997
>  6 36641.64999999999
>  Time taken: 11.02 seconds, Fetched: 5 row(s)
> {quote}
> +*In Kylin, same query got error result(only THREE row). But when you set 
> limit to 50000(original value). It is OK.*+
> !image-2018-12-16-17-06-16-341.png!
> This error is reported by Meituan's Dev.
>  
> We can find following things in log:
> {quote} 
> KYLIN [ DEBUG ] 12-16 17:04:28.299 
> org.apache.kylin.storage.gtrecord.GTCubeStorageQueryBase.enableStorageLimitIfPossible(GTCubeStorageQueryBase.java:433)
>  from Query 78808744-8324-3ad4-58ac-93ad7cd8a708-81
>  > storageLimitLevel set to LIMIT_ON_RETURN_SIZE because groupD is not 
> clustered at head, groupsD: \{0} with cuboid columns: \{1}
> KYLIN [ INFO ] 12-16 17:04:28.299 
> org.apache.kylin.storage.StorageContext.applyLimitPushDown(StorageContext.java:167)
>  from Query 78808744-8324-3ad4-58ac-93ad7cd8a708-81
>  > Enabling limit push down: 10 at level: LIMIT_ON_RETURN_SIZE
> {quote}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to