Re: Exceed scan threshold at 10000001

ShaoFeng Shi Thu, 27 Oct 2016 20:21:07 -0700

Alberto, thanks for your explaination, you got the points and is already an
Kylin expert I believe.


In order to protect HBase and Kylin from crashing by bad queries (which
scan too many rows), Kylin add this mechnisam to interrupt when reach some
threshold. Usually in an OLAP scenario, the result wouldn't be too large.
This is also a reminder for user to rethink the design; If you really want
to get the threshold be enlarged, you can allocate more memory to Kylin and
set "kylin.query.mem.budget" to bigger value.

2016-10-27 18:39 GMT+08:00 Alberto Ramón <[email protected]>:

> NOTE: I'm not a expert on Kylin  ;)
>
> Where is mandatory? No
> Where is recommended? yes
> Where bypass the threshold? No, I think this limit is hardcoded ¿?
>
> The real question must be: why this limit exists ?: (opinion)
> - The target of Kylin is Real / Near RT, limit rows --> limit response time
> - If Your are using JDBC, this is not a good option by performance
> - Protect the HBase Coprocesor
> - Perhaps you need a new Dim, to precalculate This Aggregate or filter by
> this new Dim
>
> For Extra-Large queries, you can also check:
>  -kylin.query.mem.budget= 3GB
>  -hbase.server.scanner.max.result.size = 100MB  (limit from HBase, you can
> disable with -1)
>
> Good Luck, Alb
>
> 2016-10-27 11:56 GMT+02:00 张磊 <[email protected]>:
>
> > Do you mean when i query, i should add where clause,
> > but in some case, the number of records > threshold, how can i do?
> > For example, order by all groups, the number of the  all groups >
> > threshold
> >
> >
> >
> >
> > ------------------ 原始邮件 ------------------
> > 发件人: "Alberto Ramón";<[email protected]>;
> > 发送时间: 2016年10月27日(星期四) 下午5:47
> > 收件人: "dev"<[email protected]>;
> >
> > 主题: Re: Exceed scan threshold at 10000001
> >
> >
> >
> >  ERROR: Scan row count exceeded threshold
> >
> > MailList
> > <http://mail-archives.apache.org/mod_mbox/kylin-user/
> > 201608.mbox/%3CCALjEW7M_YYi7Xs55OqPdxS6pzNvD0%2BamN2AX3hetnF0%3D9uFnow%
> > 40mail.gmail.com%3E>
> > Kilin
> > 1787 <https://issues.apache.org/jira/browse/KYLIN-1787>v1.5.3
> >
> > *Scan row count exceeded threshold: 1000000, please add filter condition
> to
> > narrow down backend scan range, like where clause*
> >
> >
> > BR, Alb
> >
> > 2016-10-27 11:40 GMT+02:00 张磊 <[email protected]>:
> >
> > > Hi
> > >
> > >
> > > When i query a sql, I do not know why should scan hbase? How can i do?
> > > Thanks!
> > >
> > >
> > > Table: lineorder  12,000,000 row records
> > > Dimensions: LO_CUSTKEY,LO_PARTKEY
> > > Measures: count(1), sum(LO_REVENUE)
> > >
> > >
> > > Query SQL: select count(1),sum(LO_REVENUE) from lineorder group by
> > > LO_CUSTKEY,LO_PARTKEY order by LO_CUSTKEY,LO_PARTKEY limit 50
> > >
> > >
> > > I build a cude with two Dimensions and two Measures(count and sum), the
> > > size of the Htable is 98 MB, when i execute a query in insight, it
> shows
> > > Error in coprocessor; and i check the hbase log, i find blow messages
> > >
> > >
> > > 2016-10-27 02:06:13,470 INFO  [B.defaultRpcServer.handler=4,
> > queue=1,port=16020]
> > > gridtable.GTScanRequest: pre aggregation is not beneficial, skip it
> > > 2016-10-27 02:06:13,470 INFO  [B.defaultRpcServer.handler=4,
> > queue=1,port=16020]
> > > endpoint.CubeVisitService: Scanned 1 rows from HBase.
> > >
> > >
> > > 2016-10-27 02:24:20,884 INFO  [B.defaultRpcServer.handler=6,
> > queue=0,port=16020]
> > > endpoint.CubeVisitService: Scanned 9999001 rows from HBase.
> > > 2016-10-27 02:24:20,889 INFO  [B.defaultRpcServer.handler=6,
> > queue=0,port=16020]
> > > endpoint.CubeVisitService: The cube visit did not finish normally
> because
> > > scan num exceeds threshold
> > > org.apache.kylin.gridtable.GTScanExceedThresholdException: Exceed scan
> > > threshold at 10000001
> > >         at org.apache.kylin.storage.hbase.cube.v2.coprocessor.
> > > endpoint.CubeVisitService$2.hasNext(CubeVisitService.java:267)
> > >         at org.apache.kylin.storage.hbase.cube.v2.
> > HBaseReadonlyStore$1$1.
> > > hasNext(HBaseReadonlyStore.java:111)
> > >         at org.apache.kylin.storage.hbase.cube.v2.coprocessor.
> > > endpoint.CubeVisitService.visitCube(CubeVisitService.java:299)
> > >         at org.apache.kylin.storage.hbase.cube.v2.coprocessor.
> > > endpoint.generated.CubeVisitProtos$CubeVisitService.callMethod(
> > > CubeVisitProtos.java:3952)
> > >         at org.apache.hadoop.hbase.regionserver.HRegion.
> > > execService(HRegion.java:7815)
> > >         at org.apache.hadoop.hbase.regionserver.RSRpcServices.
> > > execServiceOnRegion(RSRpcServices.java:1986)
> > >         at org.apache.hadoop.hbase.regionserver.RSRpcServices.
> > > execService(RSRpcServices.java:1968)
> > >         at org.apache.hadoop.hbase.protobuf.generated.
> > > ClientProtos$ClientService$2.callBlockingMethod(
> ClientProtos.java:33652)
> > >         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:
> > 2178)
> > >         at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.
> > java:112)
> > >         at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(
> > > RpcExecutor.java:133)
> > >         at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.
> > > java:108)
> > >         at java.lang.Thread.run(Thread.java:745)
> >
>



-- 
Best regards,

Shaofeng Shi 史少锋

Re: Exceed scan threshold at 10000001

Reply via email to