[ 
https://issues.apache.org/jira/browse/KYLIN-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15335353#comment-15335353
 ] 

liyang commented on KYLIN-1523:
-------------------------------

In the dump, many threads stay (hang?) at this point. A HashMap being accessed 
concurrently. I believe a later version of calcite has fixed this issue, 
because as of Kylin 1.5 + calcite 1.6.0, the HashMap has been replaced by a 
ConcurrentMap. Believe upgrade to Kylin 1.5 will solve this problem.

{code}
"http-bio-7070-exec-273" daemon prio=10 tid=0x00007f86799c1800 nid=0x4f7d 
runnable [0x00007f76f63c8000]
   java.lang.Thread.State: RUNNABLE
        at java.util.HashMap.getEntry(HashMap.java:465)
        at java.util.HashMap.get(HashMap.java:417)
        at 
org.apache.calcite.rel.metadata.ReflectiveRelMetadataProvider.apply(ReflectiveRelMetadataProvider.java:251)
        at 
org.apache.calcite.rel.metadata.ChainedRelMetadataProvider.apply(ChainedRelMetadataProvider.java:60)
        at 
org.apache.calcite.rel.metadata.MetadataFactoryImpl$1.load(MetadataFactoryImpl.java:56)
        at 
org.apache.calcite.rel.metadata.MetadataFactoryImpl$1.load(MetadataFactoryImpl.java:53)
        at 
com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3579)
        at 
com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2372)
        at 
com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2335)
        - locked <0x00007f77f9d94458> (a 
com.google.common.cache.LocalCache$StrongEntry)
        at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2250)
        at com.google.common.cache.LocalCache.get(LocalCache.java:3980)
        at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3984)
        at 
com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4868)
        at 
org.apache.calcite.rel.metadata.MetadataFactoryImpl.query(MetadataFactoryImpl.java:69)
        at 
org.apache.calcite.rel.AbstractRelNode.metadata(AbstractRelNode.java:271)
        at 
org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount(RelMetadataQuery.java:84)
        at 
org.apache.calcite.adapter.enumerable.EnumerableJoin.computeSelfCost(EnumerableJoin.java:79)
        at 
org.apache.kylin.query.relnode.OLAPJoinRel.computeSelfCost(OLAPJoinRel.java:99)
{code}

> Kylin server hang when many bad query running
> ---------------------------------------------
>
>                 Key: KYLIN-1523
>                 URL: https://issues.apache.org/jira/browse/KYLIN-1523
>             Project: Kylin
>          Issue Type: Bug
>          Components: Query Engine
>    Affects Versions: v1.4.0
>            Reporter: qianqiaoneng
>            Assignee: liyang
>            Priority: Critical
>         Attachments: kylin-425.jstack
>
>
> When there are some bad query running, kylin server will be exhausted then 
> crashed. Better to have some control on the resource that bad query consumer, 
> or kill the bad query with some timeout. Should not let them bring the server 
> down. 
> The root reason is there are too many slow queries like below exhaust kylin:
> 2016-03-21 14:28:33,487 INFO [BadQueryDetector] service.BadQueryDetector:57 : 
> Slow query has been running 3557 seconds (thread id 0x1841) – 
> 2016-03-21 14:28:33,489 INFO [BadQueryDetector] service.BadQueryDetector:57 : 
> Slow query has been running 3557 seconds (thread id 0x1840) – 
> 2016-03-21 14:28:33,490 INFO [BadQueryDetector] service.BadQueryDetector:57 : 
> Slow query has been running 3556 seconds (thread id 0x1842) – 
> 2016-03-21 14:28:33,491 INFO [BadQueryDetector] service.BadQueryDetector:57 : 
> Slow query has been running 3556 seconds (thread id 0x1843) – 
> 2016-03-21 14:28:33,493 INFO [BadQueryDetector] service.BadQueryDetector:57 : 
> Slow query has been running 3556 seconds (thread id 0x1844) – 
> 2016-03-21 14:28:33,494 INFO [BadQueryDetector] service.BadQueryDetector:57 : 
> Slow query has been running 3553 seconds (thread id 0x1845) – 
> 2016-03-21 14:28:33,495 INFO [BadQueryDetector] service.BadQueryDetector:57 : 
> Slow query has been running 3553 seconds (thread id 0x1847) – 
> 2016-03-21 14:28:33,509 INFO [BadQueryDetector] service.BadQueryDetector:57 : 
> Slow query has been running 3552 seconds (thread id 0x1848) – 
> 2016-03-21 14:28:33,512 INFO [BadQueryDetector] service.BadQueryDetector:57 : 
> Slow query has been running 3551 seconds (thread id 0x184a) – 
> 2016-03-21 14:28:33,525 INFO [BadQueryDetector] service.BadQueryDetector:57 : 
> Slow query has been running 3510 seconds (thread id 0x184c) – 
> 2016-03-21 14:28:33,554 INFO [BadQueryDetector] service.BadQueryDetector:57 : 
> Slow query has been running 3510 seconds (thread id 0x184d) – 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to