[
https://issues.apache.org/jira/browse/KYLIN-4432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Xiaoxiang Yu closed KYLIN-4432.
-------------------------------
Resolved in release 3.1.0 (2020-07-03)
> Duplicated queries with sytax error take unexpect long time when lazy query
> enabled
> -----------------------------------------------------------------------------------
>
> Key: KYLIN-4432
> URL: https://issues.apache.org/jira/browse/KYLIN-4432
> Project: Kylin
> Issue Type: Bug
> Reporter: Congling Xia
> Assignee: Congling Xia
> Priority: Major
> Fix For: v3.1.0
>
>
> Hi! Our Kylin server becomes unresponsive recently. All query threads were
> being TIMED_WAITING so that no more thread in the thread pool of Tomcat could
> response new requests. The server logs are full of "Duplicate SQL request"s,
> and many requests retrying for so long time:
> {code:java}
> $ cat kylin.log.25 | grep 14db5fa0-78e0-4ca6-5d04-92995b2b8367-3133991 | head
> -3
> 2020-03-23 12:15:32,279 INFO [Query
> 14db5fa0-78e0-4ca6-5d04-92995b2b8367-3133991] service.QueryService:571 :
> Duplicated SQL request is running, waiting...
> 2020-03-23 12:15:32,379 INFO [Query
> 14db5fa0-78e0-4ca6-5d04-92995b2b8367-3133991] service.QueryService:571 :
> Duplicated SQL request is running, waiting...
> 2020-03-23 12:15:32,479 INFO [Query
> 14db5fa0-78e0-4ca6-5d04-92995b2b8367-3133991] service.QueryService:571 :
> Duplicated SQL request is running, waiting...
> $ cat kylin.log.25 | grep 14db5fa0-78e0-4ca6-5d04-92995b2b8367-3133991 | tail
> -3
> 2020-03-23 12:26:56,920 INFO [Query
> 14db5fa0-78e0-4ca6-5d04-92995b2b8367-3133991] service.QueryService:571 :
> Duplicated SQL request is running, waiting...
> 2020-03-23 12:26:57,020 INFO [Query
> 14db5fa0-78e0-4ca6-5d04-92995b2b8367-3133991] service.QueryService:571 :
> Duplicated SQL request is running, waiting...
> 2020-03-23 12:26:57,121 INFO [Query
> 14db5fa0-78e0-4ca6-5d04-92995b2b8367-3133991] service.QueryService:571 :
> Duplicated SQL request is running, waiting...
> {code}
> This can be re-produced easily:
> * enable lazy query (KYLIN-2897) by setting environment variable
> kylin.query.cache-enabled=true and kylin.query.lazy-query-enabled=true
> * send the same query with syntax error many times in several minutes
> The first query will quickly response with error message, and the following
> queries will executing for long time and will repeatly output "Duplicated SQL
> request is running, waiting...".
> The code in org.apache.kylin.rest.service.QueryService#searchQueryInCache
> indicates duplicated query will wait at most one minute with default setting
> of LazyQueryWaitingTimeoutMilliSeconds but in fact it waits quite longer than
> that.
> {code:java}
> while (response.isRunning()) {
> // Wait at most one minute
> if (System.currentTimeMillis() - response.getLazyQueryStartTime() >=
> getConfig()
> .getLazyQueryWaitingTimeoutMilliSeconds()) {
> cache.evict(sqlRequest.getCacheKey());
> return null;
> }
> logger.info("Duplicated SQL request is running, waiting...");
> try {
> Thread.sleep(100L);
> } catch (InterruptedException e) {
> }
> wrapper = cache.get(sqlRequest.getCacheKey());
> if (wrapper == null) {
> return null;
> }
> response = (SQLResponse) wrapper.get();
> if (response == null) {
> return null;
> }
> }
> {code}
> After some work-around, I find that the dummy response status remains
> unchanged after execution with exception. So the 2nd query will wait
> LazyQueryWaitingTimeoutMilliSeconds, evict the dummy response created by the
> 1st query from cache, and put a new dummy response into the cache. And the
> 3rd query may use the new dummy response's start time to compute timeout, and
> will wait for another LazyQueryWaitingTimeoutMilliSeconds.
> Suppose we send _n_ identical bad queries to the server quickly. The last
> query may wait for nearly _(n-1)_ times of
> LazyQueryWaitingTimeoutMilliSeconds. That's why the Web container's thread
> pool was exhausted – user program uses bad queries to check availability of
> Kylin service and retries many times in a short time.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)