[ https://issues.apache.org/jira/browse/KYLIN-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16722492#comment-16722492 ]
XiaoXiang Yu commented on KYLIN-3722: ------------------------------------- h2. Analysis of cause todo > Error Limit Push Down > --------------------- > > Key: KYLIN-3722 > URL: https://issues.apache.org/jira/browse/KYLIN-3722 > Project: Kylin > Issue Type: Bug > Components: Query Engine > Affects Versions: all > Reporter: XiaoXiang Yu > Assignee: XiaoXiang Yu > Priority: Major > Labels: LimitPushDown > Fix For: v2.6.0 > > Attachments: image-2018-12-16-17-06-16-341.png, > image-2018-12-16-17-24-21-017.png, image-2018-12-16-17-38-13-454.png > > Original Estimate: 24h > Remaining Estimate: 24h > > +*Kylin limit-pushdown is sometimes cause data reduction.*+ > For example: > {quote}select uid, sum(active_minutes) as am > from useraction > where item_id in ( > select distinct item_id > from iteminfo > where item_type in ('Video') > ) and act_type != 'share' > group by uid > limit 10 > {quote} > +*In hive, we got correct result(Five row).*+ > {quote}hive> > > select uid, sum(active_minutes) as am > > from useraction > > where item_id in ( > > select distinct item_id > > from iteminfo > > where item_type in ('Video') > > ) and act_type != 'share' > > group by uid > > limit 10; > Query ID = root_20181216170145_d5667a81-46d0-4899-a4bb-7c580155049e > Total jobs = 1 > Launching Job 1 out of 1 > Status: Running (Executing on YARN cluster with App id > application_1539833412107_0414) > -------------------------------------------------------------------------------- > VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED > > -------------------------------------------------------------------------------- > Map 1 .......... SUCCEEDED 1 1 0 0 0 0 > Map 3 .......... SUCCEEDED 1 1 0 0 0 0 > Reducer 2 ...... SUCCEEDED 1 1 0 0 0 0 > Reducer 4 ...... SUCCEEDED 1 1 0 0 0 0 > > -------------------------------------------------------------------------------- > VERTICES: 04/04 [==========================>>] 100% ELAPSED TIME: 7.67 s > > -------------------------------------------------------------------------------- > OK > 1 14565.470000000008 > 2 64744.89000000003 > 3 64939.01999999984 > 5 36563.76999999997 > 6 36641.64999999999 > Time taken: 11.02 seconds, Fetched: 5 row(s) > {quote} > +*In Kylin, same query got error result(only THREE row). But when you set > limit to 50000(original value). It is OK.*+ > !image-2018-12-16-17-06-16-341.png! > This error is reported by Meituan's Dev. > > We can find following things in log: > {quote} > KYLIN [ DEBUG ] 12-16 17:04:28.299 > org.apache.kylin.storage.gtrecord.GTCubeStorageQueryBase.enableStorageLimitIfPossible(GTCubeStorageQueryBase.java:433) > from Query 78808744-8324-3ad4-58ac-93ad7cd8a708-81 > > storageLimitLevel set to LIMIT_ON_RETURN_SIZE because groupD is not > clustered at head, groupsD: \{0} with cuboid columns: \{1} > KYLIN [ INFO ] 12-16 17:04:28.299 > org.apache.kylin.storage.StorageContext.applyLimitPushDown(StorageContext.java:167) > from Query 78808744-8324-3ad4-58ac-93ad7cd8a708-81 > > Enabling limit push down: 10 at level: LIMIT_ON_RETURN_SIZE > {quote} > > {quote}KYLIN [ INFO ] 12-16 17:04:28.405 > org.apache.kylin.rest.service.QueryService.logQuery(QueryService.java:352) > from Query 78808744-8324-3ad4-58ac-93ad7cd8a708-81 > > > ==========================[QUERY]=============================== > Query Id: 78808744-8324-3ad4-58ac-93ad7cd8a708 > SQL: select uid, sum(active_minutes) as am > from useraction > where item_id in ( > select distinct item_id > from iteminfo > where item_type in ('Video') > ) and act_type != 'share' > group by uid > User: ADMIN > Success: true > Duration: 0.202 > Project: PearVideo > Realization Names: [CUBE[name=PearVideoCube1], CUBE[name=PearVideoCube1]] > Cuboid Ids: [14] > Total scan count: 120 > Total scan bytes: 6442 > Result row count: 3 > Accept Partial: true > Is Partial Result: false > Hit Exception Cache: false > Storage cache used: false > Is Query Push-Down: false > Is Prepare: false > Trace URL: null > Message: null > ==========================[QUERY]=============================== > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005)