[ 
https://issues.apache.org/jira/browse/KYLIN-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16735388#comment-16735388
 ] 

ASF GitHub Bot commented on KYLIN-3722:
---------------------------------------

shaofengshi commented on pull request #397: KYLIN-3722 Disable limit push down 
for complex query
URL: https://github.com/apache/kylin/pull/397
 
 
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Error Limit Push Down in Join Related Query
> -------------------------------------------
>
>                 Key: KYLIN-3722
>                 URL: https://issues.apache.org/jira/browse/KYLIN-3722
>             Project: Kylin
>          Issue Type: Bug
>          Components: Query Engine
>    Affects Versions: all
>            Reporter: XiaoXiang Yu
>            Assignee: XiaoXiang Yu
>            Priority: Major
>              Labels: LimitPushDown
>             Fix For: v2.6.0
>
>         Attachments: 
> 0001-storageLimitLevel-set-to-NO_LIMIT-when-the-query-has.patch, 
> image-2018-12-16-17-06-16-341.png, image-2018-12-16-17-24-21-017.png, 
> image-2018-12-16-17-38-13-454.png, image-2018-12-16-22-40-36-857.png, 
> image-2018-12-16-22-44-32-875.png, image-2018-12-16-22-45-29-841.png, 
> image-2018-12-16-22-46-43-168.png, image-2018-12-16-22-48-25-622.png, 
> image-2018-12-16-22-54-25-718.png, image-2018-12-16-22-55-23-827.png, 
> image-2018-12-16-22-56-27-258.png, image-2018-12-16-22-56-50-607.png, 
> image-2018-12-16-22-58-04-486.png
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> +*Kylin limit-pushdown is sometimes cause data reduction.*+
> For example:
> {quote}select uid, sum(active_minutes) as am
>  from useraction
>  where item_id in (
>      select distinct item_id
>      from iteminfo
>      where item_type in ('Video')
>  ) and act_type != 'share'
>  group by uid
>  limit 10
> {quote}
> +*In hive, we got correct result(Five row).*+
> {quote}hive>
>  > select uid, sum(active_minutes) as am
>  > from useraction
>  > where item_id in (
>  > select distinct item_id
>  > from iteminfo
>  > where item_type in ('Video')
>  > ) and act_type != 'share'
>  > group by uid
>  > limit 10;
>  Query ID = root_20181216170145_d5667a81-46d0-4899-a4bb-7c580155049e
>  Total jobs = 1
>  Launching Job 1 out of 1
> Status: Running (Executing on YARN cluster with App id 
> application_1539833412107_0414)
> --------------------------------------------------------------------------------
>  VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
>  
> --------------------------------------------------------------------------------
>  Map 1 .......... SUCCEEDED 1 1 0 0 0 0
>  Map 3 .......... SUCCEEDED 1 1 0 0 0 0
>  Reducer 2 ...... SUCCEEDED 1 1 0 0 0 0
>  Reducer 4 ...... SUCCEEDED 1 1 0 0 0 0
>  
> --------------------------------------------------------------------------------
>  VERTICES: 04/04 [==========================>>] 100% ELAPSED TIME: 7.67 s
>  
> --------------------------------------------------------------------------------
>  OK
>  1 14565.470000000008
>  2 64744.89000000003
>  3 64939.01999999984
>  5 36563.76999999997
>  6 36641.64999999999
>  Time taken: 11.02 seconds, Fetched: 5 row(s)
> {quote}
> +*In Kylin, same query got error result(only THREE row). But when you set 
> limit to 50000(original value). It is OK.*+
> !image-2018-12-16-17-06-16-341.png!
>  
>  
> We can find following things in log:
> {quote} 
> KYLIN [ DEBUG ] 12-16 17:04:28.299 
> org.apache.kylin.storage.gtrecord.GTCubeStorageQueryBase.enableStorageLimitIfPossible(GTCubeStorageQueryBase.java:433)
>  from Query 78808744-8324-3ad4-58ac-93ad7cd8a708-81
>  > storageLimitLevel set to LIMIT_ON_RETURN_SIZE because groupD is not 
> clustered at head, groupsD: \{0} with cuboid columns: \{1}
> KYLIN [ INFO ] 12-16 17:04:28.299 
> org.apache.kylin.storage.StorageContext.applyLimitPushDown(StorageContext.java:167)
>  from Query 78808744-8324-3ad4-58ac-93ad7cd8a708-81
>  > Enabling limit push down: 10 at level: LIMIT_ON_RETURN_SIZE
> {quote}
>  
> {quote}KYLIN [ INFO ] 12-16 17:04:28.405 
> org.apache.kylin.rest.service.QueryService.logQuery(QueryService.java:352) 
> from Query 78808744-8324-3ad4-58ac-93ad7cd8a708-81
>  > 
>  ==========================[QUERY]===============================
>  Query Id: 78808744-8324-3ad4-58ac-93ad7cd8a708
>  SQL: select uid, sum(active_minutes) as am
>  from useraction
>  where item_id in (
>  select distinct item_id
>  from iteminfo
>  where item_type in ('Video')
>  ) and act_type != 'share'
>  group by uid
>  User: ADMIN
>  Success: true
>  Duration: 0.202
>  Project: PearVideo
>  Realization Names: [CUBE[name=PearVideoCube1], CUBE[name=PearVideoCube1]]
>  Cuboid Ids: [14]
>  Total scan count: 120
>  Total scan bytes: 6442
>  Result row count: 3
>  Accept Partial: true
>  Is Partial Result: false
>  Hit Exception Cache: false
>  Storage cache used: false
>  Is Query Push-Down: false
>  Is Prepare: false
>  Trace URL: null
>  Message: null
>  ==========================[QUERY]=============================== 
> {quote}
>  
> h3. Execution Plan 
> {quote}EXECUTION PLAN BEFORE REWRITE
> OLAPToEnumerableConverter
>  OLAPLimitRel(ctx=[], fetch=[1])
>  OLAPAggregateRel(group=[\{0}], AM=[SUM($1)], ctx=[])
>  OLAPProjectRel(UID=[$0], ACTIVE_MINUTES=[$4], ctx=[])
>  OLAPFilterRel(condition=[<>($1, 'share')], ctx=[])
>  OLAPJoinRel(condition=[=($2, $9)], joinType=[inner], ctx=[])
>  OLAPTableScan(table=[[DEMO_USER_ACT, USERACTION]], ctx=[], fields=[[0, 1, 2, 
> 3, 4, 5, 6, 7, 8]])
>  OLAPAggregateRel(group=[\{0}], ctx=[])
>  OLAPProjectRel(ITEM_ID=[$0], ctx=[])
>  OLAPFilterRel(condition=[=($1, 'Video')], ctx=[])
>  OLAPTableScan(table=[[DEMO_USER_ACT, ITEMINFO]], ctx=[], fields=[[0, 1]])
> EXECUTION PLAN AFTER OLAPCONTEXT IS SET
> OLAPToEnumerableConverter
>  OLAPLimitRel(ctx=[0@null], fetch=[1])
>  OLAPAggregateRel(group=[\{0}], AM=[SUM($1)], ctx=[0@null])
>  OLAPProjectRel(UID=[$0], ACTIVE_MINUTES=[$4], ctx=[0@null])
>  OLAPFilterRel(condition=[<>($1, 'share')], ctx=[0@null])
>  OLAPJoinRel(condition=[=($2, $9)], joinType=[inner], ctx=[0@null])
>  OLAPTableScan(table=[[DEMO_USER_ACT, USERACTION]], ctx=[0@null], fields=[[0, 
> 1, 2, 3, 4, 5, 6, 7, 8]])
>  OLAPAggregateRel(group=[\{0}], ctx=[1@null])
>  OLAPProjectRel(ITEM_ID=[$0], ctx=[1@null])
>  OLAPFilterRel(condition=[=($1, 'Video')], ctx=[1@null])
>  OLAPTableScan(table=[[DEMO_USER_ACT, ITEMINFO]], ctx=[1@null], fields=[[0, 
> 1]])
> {quote}
>  
> This bug was reported by Meituan's Dev [~kangkaisen].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to