[ 
https://issues.apache.org/jira/browse/IMPALA-11409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17787851#comment-17787851
 ] 

Quanlong Huang edited comment on IMPALA-11409 at 11/20/23 7:31 AM:
-------------------------------------------------------------------

FWIW, if you see lots of impalad jvm threads start from 
JniFrontend.getCatalogMetrics(), that's hitting this issue. Example stacktraces:
{code:java}
"Thread-16 [LoadWithCaching for table list of database xxx]" #53 prio=5 
os_prio=0 cpu=71396.21ms elapsed=85723.66s tid=0x000000000e314800 nid=0x6446 
runnable  [0x00007f695d333000]
   java.lang.Thread.State: RUNNABLE
        at 
org.apache.impala.service.FeSupport.NativeGetPartialCatalogObject(Native Method)
        at 
org.apache.impala.service.FeSupport.GetPartialCatalogObject(FeSupport.java:437)
        at 
org.apache.impala.catalog.local.CatalogdMetaProvider.sendRequest(CatalogdMetaProvider.java:389)
        at 
org.apache.impala.catalog.local.CatalogdMetaProvider.access$100(CatalogdMetaProvider.java:183)
        at 
org.apache.impala.catalog.local.CatalogdMetaProvider$3.call(CatalogdMetaProvider.java:660)
        at 
org.apache.impala.catalog.local.CatalogdMetaProvider$3.call(CatalogdMetaProvider.java:655)
        at 
org.apache.impala.catalog.local.CatalogdMetaProvider.loadWithCaching(CatalogdMetaProvider.java:519)
        at 
org.apache.impala.catalog.local.CatalogdMetaProvider.loadTableList(CatalogdMetaProvider.java:652)
        at 
org.apache.impala.catalog.local.LocalDb.loadTableNames(LocalDb.java:170)
        at 
org.apache.impala.catalog.local.LocalDb.getAllTableNames(LocalDb.java:155)
        at 
org.apache.impala.service.Frontend.getCatalogMetrics(Frontend.java:780)
        at 
org.apache.impala.service.JniFrontend.getCatalogMetrics(JniFrontend.java:221)

"Thread-1382 [LoadWithCaching for table list of database xxx]" #1628 prio=5 
os_prio=0 cpu=24.67ms elapsed=56167.64s tid=0x0000000012636000 nid=0x2d22 
waiting on condition  [0x00007f69516c2000]
   java.lang.Thread.State: WAITING (parking)
        at jdk.internal.misc.Unsafe.park([email protected]/Native Method)
        - parking to wait for  <0x00000006847eccf0> (a 
java.util.concurrent.CompletableFuture$Signaller)
        at 
java.util.concurrent.locks.LockSupport.park([email protected]/LockSupport.java:194)
        at 
java.util.concurrent.CompletableFuture$Signaller.block([email protected]/CompletableFuture.java:1796)
        at 
java.util.concurrent.ForkJoinPool.managedBlock([email protected]/ForkJoinPool.java:3128)
        at 
java.util.concurrent.CompletableFuture.waitingGet([email protected]/CompletableFuture.java:1823)
        at 
java.util.concurrent.CompletableFuture.get([email protected]/CompletableFuture.java:1998)
        at 
com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:237)
        at 
org.apache.impala.catalog.local.CatalogdMetaProvider.loadWithCaching(CatalogdMetaProvider.java:512)
        at 
org.apache.impala.catalog.local.CatalogdMetaProvider.loadTableList(CatalogdMetaProvider.java:652)
        at 
org.apache.impala.catalog.local.LocalDb.loadTableNames(LocalDb.java:170)
        at 
org.apache.impala.catalog.local.LocalDb.getAllTableNames(LocalDb.java:155)
        at 
org.apache.impala.service.Frontend.getCatalogMetrics(Frontend.java:780)
        at 
org.apache.impala.service.JniFrontend.getCatalogMetrics(JniFrontend.java:221)
{code}


was (Author: stiga-huang):
FWIW, if you see lots of impalad jvm threads start from 
JniFrontend.getCatalogMetrics(), that's hitting this issue. Example stacktraces:
{code}
"Thread-16 [LoadWithCaching for table list of database xxx]" #53 prio=5 
os_prio=0 cpu=71396.21ms elapsed=85723.66s tid=0x000000000e314800 nid=0x6446 
runnable  [0x00007f695d333000]
   java.lang.Thread.State: RUNNABLE
        at 
org.apache.impala.service.FeSupport.NativeGetPartialCatalogObject(Native Method)
        at 
org.apache.impala.service.FeSupport.GetPartialCatalogObject(FeSupport.java:437)
        at 
org.apache.impala.catalog.local.CatalogdMetaProvider.sendRequest(CatalogdMetaProvider.java:389)
        at 
org.apache.impala.catalog.local.CatalogdMetaProvider.access$100(CatalogdMetaProvider.java:183)
        at 
org.apache.impala.catalog.local.CatalogdMetaProvider$3.call(CatalogdMetaProvider.java:660)
        at 
org.apache.impala.catalog.local.CatalogdMetaProvider$3.call(CatalogdMetaProvider.java:655)
        at 
org.apache.impala.catalog.local.CatalogdMetaProvider.loadWithCaching(CatalogdMetaProvider.java:519)
        at 
org.apache.impala.catalog.local.CatalogdMetaProvider.loadTableList(CatalogdMetaProvider.java:652)
        at 
org.apache.impala.catalog.local.LocalDb.loadTableNames(LocalDb.java:170)
        at 
org.apache.impala.catalog.local.LocalDb.getAllTableNames(LocalDb.java:155)
        at 
org.apache.impala.service.Frontend.getCatalogMetrics(Frontend.java:780)
        at 
org.apache.impala.service.JniFrontend.getCatalogMetrics(JniFrontend.java:221)

"Thread-1382 [LoadWithCaching for table list of database crmhdb_tmp]" #1628 
prio=5 os_prio=0 cpu=24.67ms elapsed=56167.64s tid=0x0000000012636000 
nid=0x2d22 waiting on condition  [0x00007f69516c2000]
   java.lang.Thread.State: WAITING (parking)
        at jdk.internal.misc.Unsafe.park([email protected]/Native Method)
        - parking to wait for  <0x00000006847eccf0> (a 
java.util.concurrent.CompletableFuture$Signaller)
        at 
java.util.concurrent.locks.LockSupport.park([email protected]/LockSupport.java:194)
        at 
java.util.concurrent.CompletableFuture$Signaller.block([email protected]/CompletableFuture.java:1796)
        at 
java.util.concurrent.ForkJoinPool.managedBlock([email protected]/ForkJoinPool.java:3128)
        at 
java.util.concurrent.CompletableFuture.waitingGet([email protected]/CompletableFuture.java:1823)
        at 
java.util.concurrent.CompletableFuture.get([email protected]/CompletableFuture.java:1998)
        at 
com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:237)
        at 
org.apache.impala.catalog.local.CatalogdMetaProvider.loadWithCaching(CatalogdMetaProvider.java:512)
        at 
org.apache.impala.catalog.local.CatalogdMetaProvider.loadTableList(CatalogdMetaProvider.java:652)
        at 
org.apache.impala.catalog.local.LocalDb.loadTableNames(LocalDb.java:170)
        at 
org.apache.impala.catalog.local.LocalDb.getAllTableNames(LocalDb.java:155)
        at 
org.apache.impala.service.Frontend.getCatalogMetrics(Frontend.java:780)
        at 
org.apache.impala.service.JniFrontend.getCatalogMetrics(JniFrontend.java:221)
{code}

> Skip UpdateCatalogMetrics if another thead is on-going in it
> ------------------------------------------------------------
>
>                 Key: IMPALA-11409
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11409
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Critical
>         Attachments: jstack-1.txt
>
>
> Impala coordinator tracks local metrics of the catalog, e.g. number of 
> dbs/tables. When use_local_catalog is enabled, it also tracks the cache 
> metrics, e.g. cache hit/miss count/rate.
> These metrics are updated at the end of each statement, even for simple 
> statements like "USE <db>", "SET var=xxx", "SELECT 1". The catalog update 
> thread will also update the metrics in the end.
> [https://github.com/apache/impala/blob/bb610dee09a8069bb993b4c668f7e481c1774b70/be/src/service/impala-server.cc#L1272]
> [https://github.com/apache/impala/blob/bb610dee09a8069bb993b4c668f7e481c1774b70/be/src/service/impala-server.cc#L2065]
> These metrics are global metrics of the local catalog cache. They are not 
> specifit to a single statement. It's a waste to update the metrics 
> concurrently.
> [https://github.com/apache/impala/blob/bb610dee09a8069bb993b4c668f7e481c1774b70/be/src/service/impala-server.cc#L1526-L1559]
> We've seen "hanging issues" that all statements, including the catalog update 
> thread, are slowly executing the UpdateCatalogMetrics() function. See details 
> in the attached jstack dump.
> Indeed, if one thread is running the UpdateCatalogMetrics() function, the 
> other threads can skip it and move forward.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to