In my opinion, maybe you can increase your ehcache which set in ehcache.xml or ehcache-test.xml based on you "profile" setting, but I find the memoryStoreEvictionPolicy is "LRU", I am not clear about how it works, hope someone can explain it... you can disable the auto adding if you query it with jdbc instead of web UI , and I do not find any cache in kylin storage engine, so it always fetch result from hbase. maybe it is a optimization point. I check my result with 'select xxx from xxx order by desc limit 10' and get the right result, and I think kylin will take out all data from storage engine in join or tableScan, So, it is strange that "order by desc limit" get error result...
2015-12-21 19:44 GMT+08:00 Yerui Sun <[email protected]>: > Hi,all > We met a scenario with strictly performance requirements. With some > code reading of query service and engine, here’s some questions: > > 1. The cache doesn’t work. > Assuming a query with 11000 result rows, the first execution > consumes 5+ seconds, and execute it immediately, still consumes 5 seconds. > Looking into the query log, both the two queries has 'Hit Cache: false'. > The duration threshold has been set to 3000, and the result count threshold > is 10000. With additional log, the two queries has same hash code, and both > been put into cache. But there’s a warning log as followed, which not sure > be the reason: > [WARN][net.sf.ehcache.pool.sizeof.ObjectGraphWalker.checkMaxDepth(ObjectGraphWalker.java:209)] > - The configured limit of 1,000 object references was reached while > attempting to calculate the size of the object graph. Severe performance > degradation could occur if the sizing operation continues. This can be > avoided by setting the CacheManger or Cache <sizeOfPolicy> elements > maxDepthExceededBehavior to "abort" or adding stop points with > @IgnoreSizeOf annotations. If performance degradation is NOT an issue at > the configured limit, raise the limit value using the CacheManager or Cache > <sizeOfPolicy> elements maxDepth attribute. For more information, see the > Ehcache configuration documentation. > > 2. How offset (query with page) worked? > The user want to query kylin with page like mysql. I’ve found the > writing like 'limit 100000 offset 20 row fetch next 10 row only’ worked, > meaning fetch next 10 rows start from the 21st row, and limit up to 100000. > In fact, the limit is not necessary for user, the reason to add this is to > avoid auto adding ‘limit 50000’, so is it possible to disable the auto > adding? > The second question is the processing of query with offset. One > possible way is scan the first 30 (or 100000) rows, and filter out only the > 21~30 rows by Calcite engine. Another way is jump to the 21st row directly, > and scan out only the next 10 rows. I guess we process it in the first way > for now, because there’s no way to directly jump specific rows in HBase. If > my guessing is correct, how about the next query with 'limit 100000 offset > 30 row fetch next 10 row only’? Do we need to scan out all results again? > If yes, a cache in here should accelerate the next offset queries. > > 3. How 'order by desc' worked? > The 'order by' also be used. But some result seems not right with > ‘order by desc’. Considering a query with 10000 result rows, adding ‘limit > 5000 order by desc’, the result should be 10000~5001 rows, but 5000~1 in > fact. I guess the reason is, the order was processed in Calcite engine > after scanning result, and the scan of HBase is always ascend, that’s why > got the result with 1~5000. To resolve this, is it possible to push order > by down into hbase scan, and cast into a reverse scan? > > I’m not very familiar with query service and engine for now, but feels > that there’s still some room for performance improvements. > Looking for any comments. > > > Best Regards, > Yerui Sun > [email protected] > > > >
