Questions about query: cache, offset and order by desc

Yerui Sun Mon, 21 Dec 2015 03:44:43 -0800

Hi,all
   We met a scenario with strictly performance requirements. With some code 
reading of query service and engine, here’s some questions:
  
   1. The cache doesn’t work. 
       Assuming a query with 11000 result rows, the first execution consumes 5+ 
seconds, and execute it immediately, still consumes 5 seconds. Looking into the 
query log, both the two queries has 'Hit Cache: false'. The duration threshold 
has been set to 3000, and the result count threshold is 10000. With additional 
log, the two queries has same hash code, and both been put into cache. But 
there’s a warning log as followed, which not sure be the reason:
[WARN][net.sf.ehcache.pool.sizeof.ObjectGraphWalker.checkMaxDepth(ObjectGraphWalker.java:209)]
 - The configured limit of 1,000 object references was reached while attempting 
to calculate the size of the object graph. Severe performance degradation could 
occur if the sizing operation continues. This can be avoided by setting the 
CacheManger or Cache <sizeOfPolicy> elements maxDepthExceededBehavior to 
"abort" or adding stop points with @IgnoreSizeOf annotations. If performance 
degradation is NOT an issue at the configured limit, raise the limit value 
using the CacheManager or Cache <sizeOfPolicy> elements maxDepth attribute. For 
more information, see the Ehcache configuration documentation.


    2. How offset (query with page) worked?
        The user want to query kylin with page like mysql. I’ve found the 
writing like 'limit 100000 offset 20 row fetch next 10 row only’ worked, 
meaning fetch next 10 rows start from the 21st row, and limit up to 100000. In 
fact, the limit is not necessary for user, the reason to add this is to avoid 
auto adding ‘limit 50000’, so is it possible to disable the auto adding? 
        The second question is the processing of query with offset. One 
possible way is scan the first 30 (or 100000) rows, and filter out only the 
21~30 rows by Calcite engine. Another way is jump to the 21st row directly, and 
scan out only the next 10 rows. I guess we process it in the first way for now, 
because there’s no way to directly jump specific rows in HBase. If my guessing 
is correct, how about the next query with 'limit 100000 offset 30 row fetch 
next 10 row only’? Do we need to scan out all results again? If yes, a cache in 
here should accelerate the next offset queries.

    3. How 'order by desc' worked? 
        The 'order by' also be used. But some result seems not right with 
‘order by desc’. Considering a query with 10000 result rows, adding ‘limit 5000 
order by desc’, the result should be 10000~5001 rows, but 5000~1 in fact. I 
guess the reason is, the order was processed in Calcite engine after scanning 
result, and the scan of HBase is always ascend, that’s why got the result with 
1~5000. To resolve this, is it possible to push order by down into hbase scan, 
and cast into a reverse scan?

    I’m not very familiar with query service and engine for now, but feels that 
there’s still some room for performance improvements. 
    Looking for any comments.


Best Regards,
Yerui Sun
[email protected]

Questions about query: cache, offset and order by desc

Reply via email to