[ https://issues.apache.org/jira/browse/IMPALA-14068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Quanlong Huang updated IMPALA-14068: ------------------------------------ Description: When running ALTER TABLE RECOVER PARTITIONS on a large table, HMS event processor is blocked in processing the AddPartition events of the table until the operation finishes. This introduces a significant lag in HMS event processing. Stacktrace of the EventProcessor: {noformat} "MetastoreEventsProcessor-ProcessEvents [Processing ADD_PARTITION on scale_4k_500cols_db.tbl1, eventId=85020] [Attempting to write lock table scale_4k_500cols_db.tbl1 with a timeout of 7200000 ms]" #72 daemon prio=5 os_prio=0 tid=0x000000001255f000 nid=0x324a waiting on condition [0x00007fe393372000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.impala.catalog.CatalogServiceCatalog.tryLock(CatalogServiceCatalog.java:617) at org.apache.impala.catalog.CatalogServiceCatalog.tryWriteLock(CatalogServiceCatalog.java:513) at org.apache.impala.catalog.CatalogServiceCatalog.tryWriteLock(CatalogServiceCatalog.java:520) at org.apache.impala.service.CatalogOpExecutor.tryWriteLock(CatalogOpExecutor.java:7602) at org.apache.impala.service.CatalogOpExecutor.addPartitionsIfNotRemovedLater(CatalogOpExecutor.java:4469) at org.apache.impala.catalog.events.MetastoreEvents$AddPartitionEvent.processTableEvent(MetastoreEvents.java:2479) at org.apache.impala.catalog.events.MetastoreEvents$MetastoreTableEvent.process(MetastoreEvents.java:1304) at org.apache.impala.catalog.events.MetastoreEvents$MetastoreEvent.processIfEnabled(MetastoreEvents.java:696) at org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:1302) - locked <0x00007fe3ecb5a458> (a org.apache.impala.catalog.events.MetastoreEventsProcessor) at org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:1045) at org.apache.impala.catalog.events.MetastoreEventsProcessor$$Lambda$60/45320991.run(Unknown Source) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748){noformat} Table lock is held by the AlterTableRecoverPartitions operation {noformat} "Thread-7783 [ALTER_TABLE scale_4k_500cols_db.tbl1 issued by imp...@halxg.cloudera.com] [Recovering 4000 partitions for scale_4k_500cols_db.tbl1] [Loading file and block metadata for 4000 paths for table scale_4k_500cols_db.tbl1]" #44181 prio=5 os_prio=0 tid=0x0000000010e22000 nid=0x5a6e waiting on condition [0x00007fe389713000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00007feb087e57a8> (a java.util.concurrent.FutureTask) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:429) at java.util.concurrent.FutureTask.get(FutureTask.java:191) at org.apache.impala.catalog.ParallelFileMetadataLoader.loadInternal(ParallelFileMetadataLoader.java:192) at org.apache.impala.catalog.ParallelFileMetadataLoader.load(ParallelFileMetadataLoader.java:140) at org.apache.impala.catalog.HdfsTable.loadFileMetadataForPartitions(HdfsTable.java:817) at org.apache.impala.catalog.HdfsTable.loadFileMetadataForPartitions(HdfsTable.java:770) at org.apache.impala.catalog.HdfsTable.createAndLoadPartitions(HdfsTable.java:966) at org.apache.impala.service.CatalogOpExecutor.addHdfsPartitions(CatalogOpExecutor.java:1663) at org.apache.impala.service.CatalogOpExecutor.alterTableRecoverPartitions(CatalogOpExecutor.java:6026) at org.apache.impala.service.CatalogOpExecutor.alterTable(CatalogOpExecutor.java:1298) at org.apache.impala.service.CatalogOpExecutor.execDdlRequest(CatalogOpExecutor.java:454) at org.apache.impala.service.JniCatalog.lambda$execDdl$3(JniCatalog.java:302) at org.apache.impala.service.JniCatalog$$Lambda$105/494935334.call(Unknown Source) at org.apache.impala.service.JniCatalogOp.lambda$execAndSerialize$1(JniCatalogOp.java:90) at org.apache.impala.service.JniCatalogOp$$Lambda$67/903378472.call(Unknown Source) at org.apache.impala.service.JniCatalogOp.execOp(JniCatalogOp.java:58) at org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:89) at org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:100) at org.apache.impala.service.JniCatalog.execAndSerialize(JniCatalog.java:231) at org.apache.impala.service.JniCatalog.execAndSerialize(JniCatalog.java:245) at org.apache.impala.service.JniCatalog.execDdl(JniCatalog.java:301){noformat} We need to improve the self-event detection for such AddPartition events. CC [~VenuReddy], [~hemanth619] was: When running ALTER TABLE RECOVER PARTITIONS on a large table, HMS event processor is blocked in processing the AddPartition events of the table until the operation finishes. This introduces a significant lag in HMS event processing. Stacktrace of the EventProcessor: {noformat} "MetastoreEventsProcessor-ProcessEvents [Processing ADD_PARTITION on scale_4k_500cols_db.tbl1, eventId=85020] [Attempting to write lock table scale_4k_500cols_db.tbl1 with a timeout of 7200000 ms]" #72 daemon prio=5 os_prio=0 tid=0x000000001255f000 nid=0x324a waiting on condition [0x00007fe393372000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.impala.catalog.CatalogServiceCatalog.tryLock(CatalogServiceCatalog.java:617) at org.apache.impala.catalog.CatalogServiceCatalog.tryWriteLock(CatalogServiceCatalog.java:513) at org.apache.impala.catalog.CatalogServiceCatalog.tryWriteLock(CatalogServiceCatalog.java:520) at org.apache.impala.service.CatalogOpExecutor.tryWriteLock(CatalogOpExecutor.java:7602) at org.apache.impala.service.CatalogOpExecutor.addPartitionsIfNotRemovedLater(CatalogOpExecutor.java:4469) at org.apache.impala.catalog.events.MetastoreEvents$AddPartitionEvent.processTableEvent(MetastoreEvents.java:2479) at org.apache.impala.catalog.events.MetastoreEvents$MetastoreTableEvent.process(MetastoreEvents.java:1304) at org.apache.impala.catalog.events.MetastoreEvents$MetastoreEvent.processIfEnabled(MetastoreEvents.java:696) at org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:1302) - locked <0x00007fe3ecb5a458> (a org.apache.impala.catalog.events.MetastoreEventsProcessor) at org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:1045) at org.apache.impala.catalog.events.MetastoreEventsProcessor$$Lambda$60/45320991.run(Unknown Source) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748){noformat} Table lock is held by the AlterTableRecoverPartitions operation {noformat} "Thread-7784 [ALTER_TABLE scale_4k_500cols_db.tbl3 issued by imp...@halxg.cloudera.com] [Recovering 4000 partitions for scale_4k_500cols_db.tbl3]" #44184 prio=5 os_prio=0 tid=0x000000000f252800 nid=0x5a72 runnable [0x00007fe388d10000] java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) at java.net.SocketInputStream.read(SocketInputStream.java:171) at java.net.SocketInputStream.read(SocketInputStream.java:141) at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) at java.io.BufferedInputStream.read(BufferedInputStream.java:345) - locked <0x00007fe3ed128068> (a java.io.BufferedInputStream) at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:177) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:109) at org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:338) at org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:421) at org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:397) at org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:39) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:109) at org.apache.hadoop.hive.metastore.security.TFilterTransport.readAll(TFilterTransport.java:63) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:464) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:362) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:245) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_add_partitions_req(ThriftHiveMetastore.java:2944) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.add_partitions_req(ThriftHiveMetastore.java:2931) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.add_partitions(HiveMetaStoreClient.java:910) at sun.reflect.GeneratedMethodAccessor89.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:209) at com.sun.proxy.$Proxy31.add_partitions(Unknown Source) at org.apache.impala.util.MetaStoreUtil.addPartitions(MetaStoreUtil.java:222) at org.apache.impala.service.CatalogOpExecutor.addHmsPartitions(CatalogOpExecutor.java:5041) at org.apache.impala.service.CatalogOpExecutor.alterTableRecoverPartitions(CatalogOpExecutor.java:6024) at org.apache.impala.service.CatalogOpExecutor.alterTable(CatalogOpExecutor.java:1298) at org.apache.impala.service.CatalogOpExecutor.execDdlRequest(CatalogOpExecutor.java:454) at org.apache.impala.service.JniCatalog.lambda$execDdl$3(JniCatalog.java:302) at org.apache.impala.service.JniCatalog$$Lambda$105/494935334.call(Unknown Source) at org.apache.impala.service.JniCatalogOp.lambda$execAndSerialize$1(JniCatalogOp.java:90) at org.apache.impala.service.JniCatalogOp$$Lambda$67/903378472.call(Unknown Source) at org.apache.impala.service.JniCatalogOp.execOp(JniCatalogOp.java:58) at org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:89) at org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:100) at org.apache.impala.service.JniCatalog.execAndSerialize(JniCatalog.java:231) at org.apache.impala.service.JniCatalog.execAndSerialize(JniCatalog.java:245) at org.apache.impala.service.JniCatalog.execDdl(JniCatalog.java:301){noformat} We need to improve the self-event detection for such AddPartition events. CC [~VenuReddy], [~hemanth619] > AlterTableRecoverPartitions blocks HMS event processor > ------------------------------------------------------ > > Key: IMPALA-14068 > URL: https://issues.apache.org/jira/browse/IMPALA-14068 > Project: IMPALA > Issue Type: Bug > Reporter: Quanlong Huang > Priority: Major > > When running ALTER TABLE RECOVER PARTITIONS on a large table, HMS event > processor is blocked in processing the AddPartition events of the table until > the operation finishes. This introduces a significant lag in HMS event > processing. > Stacktrace of the EventProcessor: > {noformat} > "MetastoreEventsProcessor-ProcessEvents [Processing ADD_PARTITION on > scale_4k_500cols_db.tbl1, eventId=85020] [Attempting to write lock table > scale_4k_500cols_db.tbl1 with a timeout of 7200000 ms]" #72 daemon prio=5 > os_prio=0 tid=0x000000001255f000 nid=0x324a waiting on condition > [0x00007fe393372000] > java.lang.Thread.State: TIMED_WAITING (sleeping) > at java.lang.Thread.sleep(Native Method) > at > org.apache.impala.catalog.CatalogServiceCatalog.tryLock(CatalogServiceCatalog.java:617) > at > org.apache.impala.catalog.CatalogServiceCatalog.tryWriteLock(CatalogServiceCatalog.java:513) > at > org.apache.impala.catalog.CatalogServiceCatalog.tryWriteLock(CatalogServiceCatalog.java:520) > at > org.apache.impala.service.CatalogOpExecutor.tryWriteLock(CatalogOpExecutor.java:7602) > at > org.apache.impala.service.CatalogOpExecutor.addPartitionsIfNotRemovedLater(CatalogOpExecutor.java:4469) > at > org.apache.impala.catalog.events.MetastoreEvents$AddPartitionEvent.processTableEvent(MetastoreEvents.java:2479) > at > org.apache.impala.catalog.events.MetastoreEvents$MetastoreTableEvent.process(MetastoreEvents.java:1304) > at > org.apache.impala.catalog.events.MetastoreEvents$MetastoreEvent.processIfEnabled(MetastoreEvents.java:696) > at > org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:1302) > - locked <0x00007fe3ecb5a458> (a > org.apache.impala.catalog.events.MetastoreEventsProcessor) > at > org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:1045) > at > org.apache.impala.catalog.events.MetastoreEventsProcessor$$Lambda$60/45320991.run(Unknown > Source) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748){noformat} > Table lock is held by the AlterTableRecoverPartitions operation > {noformat} > "Thread-7783 [ALTER_TABLE scale_4k_500cols_db.tbl1 issued by > imp...@halxg.cloudera.com] [Recovering 4000 partitions for > scale_4k_500cols_db.tbl1] [Loading file and block metadata for 4000 paths for > table scale_4k_500cols_db.tbl1]" #44181 prio=5 os_prio=0 > tid=0x0000000010e22000 nid=0x5a6e waiting on condition [0x00007fe389713000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00007feb087e57a8> (a > java.util.concurrent.FutureTask) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:429) > at java.util.concurrent.FutureTask.get(FutureTask.java:191) > at > org.apache.impala.catalog.ParallelFileMetadataLoader.loadInternal(ParallelFileMetadataLoader.java:192) > at > org.apache.impala.catalog.ParallelFileMetadataLoader.load(ParallelFileMetadataLoader.java:140) > at > org.apache.impala.catalog.HdfsTable.loadFileMetadataForPartitions(HdfsTable.java:817) > at > org.apache.impala.catalog.HdfsTable.loadFileMetadataForPartitions(HdfsTable.java:770) > at > org.apache.impala.catalog.HdfsTable.createAndLoadPartitions(HdfsTable.java:966) > at > org.apache.impala.service.CatalogOpExecutor.addHdfsPartitions(CatalogOpExecutor.java:1663) > at > org.apache.impala.service.CatalogOpExecutor.alterTableRecoverPartitions(CatalogOpExecutor.java:6026) > at > org.apache.impala.service.CatalogOpExecutor.alterTable(CatalogOpExecutor.java:1298) > at > org.apache.impala.service.CatalogOpExecutor.execDdlRequest(CatalogOpExecutor.java:454) > at > org.apache.impala.service.JniCatalog.lambda$execDdl$3(JniCatalog.java:302) > at > org.apache.impala.service.JniCatalog$$Lambda$105/494935334.call(Unknown > Source) > at > org.apache.impala.service.JniCatalogOp.lambda$execAndSerialize$1(JniCatalogOp.java:90) > at > org.apache.impala.service.JniCatalogOp$$Lambda$67/903378472.call(Unknown > Source) > at org.apache.impala.service.JniCatalogOp.execOp(JniCatalogOp.java:58) > at > org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:89) > at > org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:100) > at > org.apache.impala.service.JniCatalog.execAndSerialize(JniCatalog.java:231) > at > org.apache.impala.service.JniCatalog.execAndSerialize(JniCatalog.java:245) > at > org.apache.impala.service.JniCatalog.execDdl(JniCatalog.java:301){noformat} > We need to improve the self-event detection for such AddPartition events. > CC [~VenuReddy], [~hemanth619] -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org