Re: Exceed scan threshold at 10000001

Alberto Ramón Thu, 27 Oct 2016 03:39:45 -0700

NOTE: I'm not a expert on Kylin  ;)

Where is mandatory? No
Where is recommended? yes
Where bypass the threshold? No, I think this limit is hardcoded ¿?


The real question must be: why this limit exists ?: (opinion)
- The target of Kylin is Real / Near RT, limit rows --> limit response time
- If Your are using JDBC, this is not a good option by performance
- Protect the HBase Coprocesor
- Perhaps you need a new Dim, to precalculate This Aggregate or filter by
this new Dim

For Extra-Large queries, you can also check:
 -kylin.query.mem.budget= 3GB
 -hbase.server.scanner.max.result.size = 100MB  (limit from HBase, you can
disable with -1)

Good Luck, Alb

2016-10-27 11:56 GMT+02:00 张磊 <[email protected]>:

> Do you mean when i query, i should add where clause,
> but in some case, the number of records > threshold, how can i do?
> For example, order by all groups, the number of the  all groups >
> threshold
>
>
>
>
> ------------------ 原始邮件 ------------------
> 发件人: "Alberto Ramón";<[email protected]>;
> 发送时间: 2016年10月27日(星期四) 下午5:47
> 收件人: "dev"<[email protected]>;
>
> 主题: Re: Exceed scan threshold at 10000001
>
>
>
>  ERROR: Scan row count exceeded threshold
>
> MailList
> <http://mail-archives.apache.org/mod_mbox/kylin-user/
> 201608.mbox/%3CCALjEW7M_YYi7Xs55OqPdxS6pzNvD0%2BamN2AX3hetnF0%3D9uFnow%
> 40mail.gmail.com%3E>
> Kilin
> 1787 <https://issues.apache.org/jira/browse/KYLIN-1787>v1.5.3
>
> *Scan row count exceeded threshold: 1000000, please add filter condition to
> narrow down backend scan range, like where clause*
>
>
> BR, Alb
>
> 2016-10-27 11:40 GMT+02:00 张磊 <[email protected]>:
>
> > Hi
> >
> >
> > When i query a sql, I do not know why should scan hbase? How can i do?
> > Thanks!
> >
> >
> > Table: lineorder  12,000,000 row records
> > Dimensions: LO_CUSTKEY,LO_PARTKEY
> > Measures: count(1), sum(LO_REVENUE)
> >
> >
> > Query SQL: select count(1),sum(LO_REVENUE) from lineorder group by
> > LO_CUSTKEY,LO_PARTKEY order by LO_CUSTKEY,LO_PARTKEY limit 50
> >
> >
> > I build a cude with two Dimensions and two Measures(count and sum), the
> > size of the Htable is 98 MB, when i execute a query in insight, it shows
> > Error in coprocessor; and i check the hbase log, i find blow messages
> >
> >
> > 2016-10-27 02:06:13,470 INFO  [B.defaultRpcServer.handler=4,
> queue=1,port=16020]
> > gridtable.GTScanRequest: pre aggregation is not beneficial, skip it
> > 2016-10-27 02:06:13,470 INFO  [B.defaultRpcServer.handler=4,
> queue=1,port=16020]
> > endpoint.CubeVisitService: Scanned 1 rows from HBase.
> >
> >
> > 2016-10-27 02:24:20,884 INFO  [B.defaultRpcServer.handler=6,
> queue=0,port=16020]
> > endpoint.CubeVisitService: Scanned 9999001 rows from HBase.
> > 2016-10-27 02:24:20,889 INFO  [B.defaultRpcServer.handler=6,
> queue=0,port=16020]
> > endpoint.CubeVisitService: The cube visit did not finish normally because
> > scan num exceeds threshold
> > org.apache.kylin.gridtable.GTScanExceedThresholdException: Exceed scan
> > threshold at 10000001
> >         at org.apache.kylin.storage.hbase.cube.v2.coprocessor.
> > endpoint.CubeVisitService$2.hasNext(CubeVisitService.java:267)
> >         at org.apache.kylin.storage.hbase.cube.v2.
> HBaseReadonlyStore$1$1.
> > hasNext(HBaseReadonlyStore.java:111)
> >         at org.apache.kylin.storage.hbase.cube.v2.coprocessor.
> > endpoint.CubeVisitService.visitCube(CubeVisitService.java:299)
> >         at org.apache.kylin.storage.hbase.cube.v2.coprocessor.
> > endpoint.generated.CubeVisitProtos$CubeVisitService.callMethod(
> > CubeVisitProtos.java:3952)
> >         at org.apache.hadoop.hbase.regionserver.HRegion.
> > execService(HRegion.java:7815)
> >         at org.apache.hadoop.hbase.regionserver.RSRpcServices.
> > execServiceOnRegion(RSRpcServices.java:1986)
> >         at org.apache.hadoop.hbase.regionserver.RSRpcServices.
> > execService(RSRpcServices.java:1968)
> >         at org.apache.hadoop.hbase.protobuf.generated.
> > ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:33652)
> >         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:
> 2178)
> >         at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.
> java:112)
> >         at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(
> > RpcExecutor.java:133)
> >         at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.
> > java:108)
> >         at java.lang.Thread.run(Thread.java:745)
>

Re: Exceed scan threshold at 10000001

Reply via email to