[
https://issues.apache.org/jira/browse/IMPALA-11409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17787851#comment-17787851
]
Quanlong Huang edited comment on IMPALA-11409 at 11/20/23 7:31 AM:
-------------------------------------------------------------------
FWIW, if you see lots of impalad jvm threads start from
JniFrontend.getCatalogMetrics(), that's hitting this issue. Example stacktraces:
{code:java}
"Thread-16 [LoadWithCaching for table list of database xxx]" #53 prio=5
os_prio=0 cpu=71396.21ms elapsed=85723.66s tid=0x000000000e314800 nid=0x6446
runnable [0x00007f695d333000]
java.lang.Thread.State: RUNNABLE
at
org.apache.impala.service.FeSupport.NativeGetPartialCatalogObject(Native Method)
at
org.apache.impala.service.FeSupport.GetPartialCatalogObject(FeSupport.java:437)
at
org.apache.impala.catalog.local.CatalogdMetaProvider.sendRequest(CatalogdMetaProvider.java:389)
at
org.apache.impala.catalog.local.CatalogdMetaProvider.access$100(CatalogdMetaProvider.java:183)
at
org.apache.impala.catalog.local.CatalogdMetaProvider$3.call(CatalogdMetaProvider.java:660)
at
org.apache.impala.catalog.local.CatalogdMetaProvider$3.call(CatalogdMetaProvider.java:655)
at
org.apache.impala.catalog.local.CatalogdMetaProvider.loadWithCaching(CatalogdMetaProvider.java:519)
at
org.apache.impala.catalog.local.CatalogdMetaProvider.loadTableList(CatalogdMetaProvider.java:652)
at
org.apache.impala.catalog.local.LocalDb.loadTableNames(LocalDb.java:170)
at
org.apache.impala.catalog.local.LocalDb.getAllTableNames(LocalDb.java:155)
at
org.apache.impala.service.Frontend.getCatalogMetrics(Frontend.java:780)
at
org.apache.impala.service.JniFrontend.getCatalogMetrics(JniFrontend.java:221)
"Thread-1382 [LoadWithCaching for table list of database xxx]" #1628 prio=5
os_prio=0 cpu=24.67ms elapsed=56167.64s tid=0x0000000012636000 nid=0x2d22
waiting on condition [0x00007f69516c2000]
java.lang.Thread.State: WAITING (parking)
at jdk.internal.misc.Unsafe.park([email protected]/Native Method)
- parking to wait for <0x00000006847eccf0> (a
java.util.concurrent.CompletableFuture$Signaller)
at
java.util.concurrent.locks.LockSupport.park([email protected]/LockSupport.java:194)
at
java.util.concurrent.CompletableFuture$Signaller.block([email protected]/CompletableFuture.java:1796)
at
java.util.concurrent.ForkJoinPool.managedBlock([email protected]/ForkJoinPool.java:3128)
at
java.util.concurrent.CompletableFuture.waitingGet([email protected]/CompletableFuture.java:1823)
at
java.util.concurrent.CompletableFuture.get([email protected]/CompletableFuture.java:1998)
at
com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:237)
at
org.apache.impala.catalog.local.CatalogdMetaProvider.loadWithCaching(CatalogdMetaProvider.java:512)
at
org.apache.impala.catalog.local.CatalogdMetaProvider.loadTableList(CatalogdMetaProvider.java:652)
at
org.apache.impala.catalog.local.LocalDb.loadTableNames(LocalDb.java:170)
at
org.apache.impala.catalog.local.LocalDb.getAllTableNames(LocalDb.java:155)
at
org.apache.impala.service.Frontend.getCatalogMetrics(Frontend.java:780)
at
org.apache.impala.service.JniFrontend.getCatalogMetrics(JniFrontend.java:221)
{code}
was (Author: stiga-huang):
FWIW, if you see lots of impalad jvm threads start from
JniFrontend.getCatalogMetrics(), that's hitting this issue. Example stacktraces:
{code}
"Thread-16 [LoadWithCaching for table list of database xxx]" #53 prio=5
os_prio=0 cpu=71396.21ms elapsed=85723.66s tid=0x000000000e314800 nid=0x6446
runnable [0x00007f695d333000]
java.lang.Thread.State: RUNNABLE
at
org.apache.impala.service.FeSupport.NativeGetPartialCatalogObject(Native Method)
at
org.apache.impala.service.FeSupport.GetPartialCatalogObject(FeSupport.java:437)
at
org.apache.impala.catalog.local.CatalogdMetaProvider.sendRequest(CatalogdMetaProvider.java:389)
at
org.apache.impala.catalog.local.CatalogdMetaProvider.access$100(CatalogdMetaProvider.java:183)
at
org.apache.impala.catalog.local.CatalogdMetaProvider$3.call(CatalogdMetaProvider.java:660)
at
org.apache.impala.catalog.local.CatalogdMetaProvider$3.call(CatalogdMetaProvider.java:655)
at
org.apache.impala.catalog.local.CatalogdMetaProvider.loadWithCaching(CatalogdMetaProvider.java:519)
at
org.apache.impala.catalog.local.CatalogdMetaProvider.loadTableList(CatalogdMetaProvider.java:652)
at
org.apache.impala.catalog.local.LocalDb.loadTableNames(LocalDb.java:170)
at
org.apache.impala.catalog.local.LocalDb.getAllTableNames(LocalDb.java:155)
at
org.apache.impala.service.Frontend.getCatalogMetrics(Frontend.java:780)
at
org.apache.impala.service.JniFrontend.getCatalogMetrics(JniFrontend.java:221)
"Thread-1382 [LoadWithCaching for table list of database crmhdb_tmp]" #1628
prio=5 os_prio=0 cpu=24.67ms elapsed=56167.64s tid=0x0000000012636000
nid=0x2d22 waiting on condition [0x00007f69516c2000]
java.lang.Thread.State: WAITING (parking)
at jdk.internal.misc.Unsafe.park([email protected]/Native Method)
- parking to wait for <0x00000006847eccf0> (a
java.util.concurrent.CompletableFuture$Signaller)
at
java.util.concurrent.locks.LockSupport.park([email protected]/LockSupport.java:194)
at
java.util.concurrent.CompletableFuture$Signaller.block([email protected]/CompletableFuture.java:1796)
at
java.util.concurrent.ForkJoinPool.managedBlock([email protected]/ForkJoinPool.java:3128)
at
java.util.concurrent.CompletableFuture.waitingGet([email protected]/CompletableFuture.java:1823)
at
java.util.concurrent.CompletableFuture.get([email protected]/CompletableFuture.java:1998)
at
com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:237)
at
org.apache.impala.catalog.local.CatalogdMetaProvider.loadWithCaching(CatalogdMetaProvider.java:512)
at
org.apache.impala.catalog.local.CatalogdMetaProvider.loadTableList(CatalogdMetaProvider.java:652)
at
org.apache.impala.catalog.local.LocalDb.loadTableNames(LocalDb.java:170)
at
org.apache.impala.catalog.local.LocalDb.getAllTableNames(LocalDb.java:155)
at
org.apache.impala.service.Frontend.getCatalogMetrics(Frontend.java:780)
at
org.apache.impala.service.JniFrontend.getCatalogMetrics(JniFrontend.java:221)
{code}
> Skip UpdateCatalogMetrics if another thead is on-going in it
> ------------------------------------------------------------
>
> Key: IMPALA-11409
> URL: https://issues.apache.org/jira/browse/IMPALA-11409
> Project: IMPALA
> Issue Type: Improvement
> Components: Backend
> Reporter: Quanlong Huang
> Assignee: Quanlong Huang
> Priority: Critical
> Attachments: jstack-1.txt
>
>
> Impala coordinator tracks local metrics of the catalog, e.g. number of
> dbs/tables. When use_local_catalog is enabled, it also tracks the cache
> metrics, e.g. cache hit/miss count/rate.
> These metrics are updated at the end of each statement, even for simple
> statements like "USE <db>", "SET var=xxx", "SELECT 1". The catalog update
> thread will also update the metrics in the end.
> [https://github.com/apache/impala/blob/bb610dee09a8069bb993b4c668f7e481c1774b70/be/src/service/impala-server.cc#L1272]
> [https://github.com/apache/impala/blob/bb610dee09a8069bb993b4c668f7e481c1774b70/be/src/service/impala-server.cc#L2065]
> These metrics are global metrics of the local catalog cache. They are not
> specifit to a single statement. It's a waste to update the metrics
> concurrently.
> [https://github.com/apache/impala/blob/bb610dee09a8069bb993b4c668f7e481c1774b70/be/src/service/impala-server.cc#L1526-L1559]
> We've seen "hanging issues" that all statements, including the catalog update
> thread, are slowly executing the UpdateCatalogMetrics() function. See details
> in the attached jstack dump.
> Indeed, if one thread is running the UpdateCatalogMetrics() function, the
> other threads can skip it and move forward.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]