[ 
https://issues.apache.org/jira/browse/IMPALA-14068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-14068:
------------------------------------
    Description: 
When running ALTER TABLE RECOVER PARTITIONS on a large table, HMS event 
processor is blocked in processing the AddPartition events of the table until 
the operation finishes. This introduces a significant lag in HMS event 
processing.

Stacktrace of the EventProcessor:
{noformat}
"MetastoreEventsProcessor-ProcessEvents [Processing ADD_PARTITION on 
scale_4k_500cols_db.tbl1, eventId=85020] [Attempting to write lock table 
scale_4k_500cols_db.tbl1 with a timeout of 7200000 ms]" #72 daemon prio=5 
os_prio=0 tid=0x000000001255f000 nid=0x324a waiting on condition 
[0x00007fe393372000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at 
org.apache.impala.catalog.CatalogServiceCatalog.tryLock(CatalogServiceCatalog.java:617)
        at 
org.apache.impala.catalog.CatalogServiceCatalog.tryWriteLock(CatalogServiceCatalog.java:513)
        at 
org.apache.impala.catalog.CatalogServiceCatalog.tryWriteLock(CatalogServiceCatalog.java:520)
        at 
org.apache.impala.service.CatalogOpExecutor.tryWriteLock(CatalogOpExecutor.java:7602)
        at 
org.apache.impala.service.CatalogOpExecutor.addPartitionsIfNotRemovedLater(CatalogOpExecutor.java:4469)
        at 
org.apache.impala.catalog.events.MetastoreEvents$AddPartitionEvent.processTableEvent(MetastoreEvents.java:2479)
        at 
org.apache.impala.catalog.events.MetastoreEvents$MetastoreTableEvent.process(MetastoreEvents.java:1304)
        at 
org.apache.impala.catalog.events.MetastoreEvents$MetastoreEvent.processIfEnabled(MetastoreEvents.java:696)
        at 
org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:1302)
        - locked <0x00007fe3ecb5a458> (a 
org.apache.impala.catalog.events.MetastoreEventsProcessor)
        at 
org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:1045)
        at 
org.apache.impala.catalog.events.MetastoreEventsProcessor$$Lambda$60/45320991.run(Unknown
 Source)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748){noformat}
Table lock is held by the AlterTableRecoverPartitions operation
{noformat}
"Thread-7783 [ALTER_TABLE scale_4k_500cols_db.tbl1 issued by 
imp...@halxg.cloudera.com] [Recovering 4000 partitions for 
scale_4k_500cols_db.tbl1] [Loading file and block metadata for 4000 paths for 
table scale_4k_500cols_db.tbl1]" #44181 prio=5 os_prio=0 tid=0x0000000010e22000 
nid=0x5a6e waiting on condition [0x00007fe389713000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00007feb087e57a8> (a 
java.util.concurrent.FutureTask)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:429)
        at java.util.concurrent.FutureTask.get(FutureTask.java:191)
        at 
org.apache.impala.catalog.ParallelFileMetadataLoader.loadInternal(ParallelFileMetadataLoader.java:192)
        at 
org.apache.impala.catalog.ParallelFileMetadataLoader.load(ParallelFileMetadataLoader.java:140)
        at 
org.apache.impala.catalog.HdfsTable.loadFileMetadataForPartitions(HdfsTable.java:817)
        at 
org.apache.impala.catalog.HdfsTable.loadFileMetadataForPartitions(HdfsTable.java:770)
        at 
org.apache.impala.catalog.HdfsTable.createAndLoadPartitions(HdfsTable.java:966)
        at 
org.apache.impala.service.CatalogOpExecutor.addHdfsPartitions(CatalogOpExecutor.java:1663)
        at 
org.apache.impala.service.CatalogOpExecutor.alterTableRecoverPartitions(CatalogOpExecutor.java:6026)
        at 
org.apache.impala.service.CatalogOpExecutor.alterTable(CatalogOpExecutor.java:1298)
        at 
org.apache.impala.service.CatalogOpExecutor.execDdlRequest(CatalogOpExecutor.java:454)
        at 
org.apache.impala.service.JniCatalog.lambda$execDdl$3(JniCatalog.java:302)
        at 
org.apache.impala.service.JniCatalog$$Lambda$105/494935334.call(Unknown Source)
        at 
org.apache.impala.service.JniCatalogOp.lambda$execAndSerialize$1(JniCatalogOp.java:90)
        at 
org.apache.impala.service.JniCatalogOp$$Lambda$67/903378472.call(Unknown Source)
        at org.apache.impala.service.JniCatalogOp.execOp(JniCatalogOp.java:58)
        at 
org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:89)
        at 
org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:100)
        at 
org.apache.impala.service.JniCatalog.execAndSerialize(JniCatalog.java:231)
        at 
org.apache.impala.service.JniCatalog.execAndSerialize(JniCatalog.java:245)
        at 
org.apache.impala.service.JniCatalog.execDdl(JniCatalog.java:301){noformat}
We need to improve the self-event detection for such AddPartition events.

CC [~VenuReddy], [~hemanth619]

  was:
When running ALTER TABLE RECOVER PARTITIONS on a large table, HMS event 
processor is blocked in processing the AddPartition events of the table until 
the operation finishes. This introduces a significant lag in HMS event 
processing.

Stacktrace of the EventProcessor:
{noformat}
"MetastoreEventsProcessor-ProcessEvents [Processing ADD_PARTITION on 
scale_4k_500cols_db.tbl1, eventId=85020] [Attempting to write lock table 
scale_4k_500cols_db.tbl1 with a timeout of 7200000 ms]" #72 daemon prio=5 
os_prio=0 tid=0x000000001255f000 nid=0x324a waiting on condition 
[0x00007fe393372000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at 
org.apache.impala.catalog.CatalogServiceCatalog.tryLock(CatalogServiceCatalog.java:617)
        at 
org.apache.impala.catalog.CatalogServiceCatalog.tryWriteLock(CatalogServiceCatalog.java:513)
        at 
org.apache.impala.catalog.CatalogServiceCatalog.tryWriteLock(CatalogServiceCatalog.java:520)
        at 
org.apache.impala.service.CatalogOpExecutor.tryWriteLock(CatalogOpExecutor.java:7602)
        at 
org.apache.impala.service.CatalogOpExecutor.addPartitionsIfNotRemovedLater(CatalogOpExecutor.java:4469)
        at 
org.apache.impala.catalog.events.MetastoreEvents$AddPartitionEvent.processTableEvent(MetastoreEvents.java:2479)
        at 
org.apache.impala.catalog.events.MetastoreEvents$MetastoreTableEvent.process(MetastoreEvents.java:1304)
        at 
org.apache.impala.catalog.events.MetastoreEvents$MetastoreEvent.processIfEnabled(MetastoreEvents.java:696)
        at 
org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:1302)
        - locked <0x00007fe3ecb5a458> (a 
org.apache.impala.catalog.events.MetastoreEventsProcessor)
        at 
org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:1045)
        at 
org.apache.impala.catalog.events.MetastoreEventsProcessor$$Lambda$60/45320991.run(Unknown
 Source)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748){noformat}
Table lock is held by the AlterTableRecoverPartitions operation
{noformat}
"Thread-7784 [ALTER_TABLE scale_4k_500cols_db.tbl3 issued by 
imp...@halxg.cloudera.com] [Recovering 4000 partitions for 
scale_4k_500cols_db.tbl3]" #44184 prio=5 os_prio=0 tid=0x000000000f252800 
nid=0x5a72 runnable [0x00007fe388d10000]
   java.lang.Thread.State: RUNNABLE
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
        at java.net.SocketInputStream.read(SocketInputStream.java:171)
        at java.net.SocketInputStream.read(SocketInputStream.java:141)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
        at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
        - locked <0x00007fe3ed128068> (a java.io.BufferedInputStream)
        at 
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:177)
        at org.apache.thrift.transport.TTransport.readAll(TTransport.java:109)
        at 
org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:338)
        at 
org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:421)
        at 
org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:397)
        at 
org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:39)
        at org.apache.thrift.transport.TTransport.readAll(TTransport.java:109)
        at 
org.apache.hadoop.hive.metastore.security.TFilterTransport.readAll(TFilterTransport.java:63)
        at 
org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:464)
        at 
org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:362)
        at 
org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:245)
        at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77)
        at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_add_partitions_req(ThriftHiveMetastore.java:2944)
        at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.add_partitions_req(ThriftHiveMetastore.java:2931)
        at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.add_partitions(HiveMetaStoreClient.java:910)
        at sun.reflect.GeneratedMethodAccessor89.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:209)
        at com.sun.proxy.$Proxy31.add_partitions(Unknown Source)
        at 
org.apache.impala.util.MetaStoreUtil.addPartitions(MetaStoreUtil.java:222)
        at 
org.apache.impala.service.CatalogOpExecutor.addHmsPartitions(CatalogOpExecutor.java:5041)
        at 
org.apache.impala.service.CatalogOpExecutor.alterTableRecoverPartitions(CatalogOpExecutor.java:6024)
        at 
org.apache.impala.service.CatalogOpExecutor.alterTable(CatalogOpExecutor.java:1298)
        at 
org.apache.impala.service.CatalogOpExecutor.execDdlRequest(CatalogOpExecutor.java:454)
        at 
org.apache.impala.service.JniCatalog.lambda$execDdl$3(JniCatalog.java:302)
        at 
org.apache.impala.service.JniCatalog$$Lambda$105/494935334.call(Unknown Source)
        at 
org.apache.impala.service.JniCatalogOp.lambda$execAndSerialize$1(JniCatalogOp.java:90)
        at 
org.apache.impala.service.JniCatalogOp$$Lambda$67/903378472.call(Unknown Source)
        at org.apache.impala.service.JniCatalogOp.execOp(JniCatalogOp.java:58)
        at 
org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:89)
        at 
org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:100)
        at 
org.apache.impala.service.JniCatalog.execAndSerialize(JniCatalog.java:231)
        at 
org.apache.impala.service.JniCatalog.execAndSerialize(JniCatalog.java:245)
        at 
org.apache.impala.service.JniCatalog.execDdl(JniCatalog.java:301){noformat}
We need to improve the self-event detection for such AddPartition events.

CC [~VenuReddy], [~hemanth619]


> AlterTableRecoverPartitions blocks HMS event processor
> ------------------------------------------------------
>
>                 Key: IMPALA-14068
>                 URL: https://issues.apache.org/jira/browse/IMPALA-14068
>             Project: IMPALA
>          Issue Type: Bug
>            Reporter: Quanlong Huang
>            Priority: Major
>
> When running ALTER TABLE RECOVER PARTITIONS on a large table, HMS event 
> processor is blocked in processing the AddPartition events of the table until 
> the operation finishes. This introduces a significant lag in HMS event 
> processing.
> Stacktrace of the EventProcessor:
> {noformat}
> "MetastoreEventsProcessor-ProcessEvents [Processing ADD_PARTITION on 
> scale_4k_500cols_db.tbl1, eventId=85020] [Attempting to write lock table 
> scale_4k_500cols_db.tbl1 with a timeout of 7200000 ms]" #72 daemon prio=5 
> os_prio=0 tid=0x000000001255f000 nid=0x324a waiting on condition 
> [0x00007fe393372000]
>    java.lang.Thread.State: TIMED_WAITING (sleeping)
>         at java.lang.Thread.sleep(Native Method)
>         at 
> org.apache.impala.catalog.CatalogServiceCatalog.tryLock(CatalogServiceCatalog.java:617)
>         at 
> org.apache.impala.catalog.CatalogServiceCatalog.tryWriteLock(CatalogServiceCatalog.java:513)
>         at 
> org.apache.impala.catalog.CatalogServiceCatalog.tryWriteLock(CatalogServiceCatalog.java:520)
>         at 
> org.apache.impala.service.CatalogOpExecutor.tryWriteLock(CatalogOpExecutor.java:7602)
>         at 
> org.apache.impala.service.CatalogOpExecutor.addPartitionsIfNotRemovedLater(CatalogOpExecutor.java:4469)
>         at 
> org.apache.impala.catalog.events.MetastoreEvents$AddPartitionEvent.processTableEvent(MetastoreEvents.java:2479)
>         at 
> org.apache.impala.catalog.events.MetastoreEvents$MetastoreTableEvent.process(MetastoreEvents.java:1304)
>         at 
> org.apache.impala.catalog.events.MetastoreEvents$MetastoreEvent.processIfEnabled(MetastoreEvents.java:696)
>         at 
> org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:1302)
>         - locked <0x00007fe3ecb5a458> (a 
> org.apache.impala.catalog.events.MetastoreEventsProcessor)
>         at 
> org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:1045)
>         at 
> org.apache.impala.catalog.events.MetastoreEventsProcessor$$Lambda$60/45320991.run(Unknown
>  Source)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748){noformat}
> Table lock is held by the AlterTableRecoverPartitions operation
> {noformat}
> "Thread-7783 [ALTER_TABLE scale_4k_500cols_db.tbl1 issued by 
> imp...@halxg.cloudera.com] [Recovering 4000 partitions for 
> scale_4k_500cols_db.tbl1] [Loading file and block metadata for 4000 paths for 
> table scale_4k_500cols_db.tbl1]" #44181 prio=5 os_prio=0 
> tid=0x0000000010e22000 nid=0x5a6e waiting on condition [0x00007fe389713000]
>    java.lang.Thread.State: WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x00007feb087e57a8> (a 
> java.util.concurrent.FutureTask)
>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>         at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:429)
>         at java.util.concurrent.FutureTask.get(FutureTask.java:191)
>         at 
> org.apache.impala.catalog.ParallelFileMetadataLoader.loadInternal(ParallelFileMetadataLoader.java:192)
>         at 
> org.apache.impala.catalog.ParallelFileMetadataLoader.load(ParallelFileMetadataLoader.java:140)
>         at 
> org.apache.impala.catalog.HdfsTable.loadFileMetadataForPartitions(HdfsTable.java:817)
>         at 
> org.apache.impala.catalog.HdfsTable.loadFileMetadataForPartitions(HdfsTable.java:770)
>         at 
> org.apache.impala.catalog.HdfsTable.createAndLoadPartitions(HdfsTable.java:966)
>         at 
> org.apache.impala.service.CatalogOpExecutor.addHdfsPartitions(CatalogOpExecutor.java:1663)
>         at 
> org.apache.impala.service.CatalogOpExecutor.alterTableRecoverPartitions(CatalogOpExecutor.java:6026)
>         at 
> org.apache.impala.service.CatalogOpExecutor.alterTable(CatalogOpExecutor.java:1298)
>         at 
> org.apache.impala.service.CatalogOpExecutor.execDdlRequest(CatalogOpExecutor.java:454)
>         at 
> org.apache.impala.service.JniCatalog.lambda$execDdl$3(JniCatalog.java:302)
>         at 
> org.apache.impala.service.JniCatalog$$Lambda$105/494935334.call(Unknown 
> Source)
>         at 
> org.apache.impala.service.JniCatalogOp.lambda$execAndSerialize$1(JniCatalogOp.java:90)
>         at 
> org.apache.impala.service.JniCatalogOp$$Lambda$67/903378472.call(Unknown 
> Source)
>         at org.apache.impala.service.JniCatalogOp.execOp(JniCatalogOp.java:58)
>         at 
> org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:89)
>         at 
> org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:100)
>         at 
> org.apache.impala.service.JniCatalog.execAndSerialize(JniCatalog.java:231)
>         at 
> org.apache.impala.service.JniCatalog.execAndSerialize(JniCatalog.java:245)
>         at 
> org.apache.impala.service.JniCatalog.execDdl(JniCatalog.java:301){noformat}
> We need to improve the self-event detection for such AddPartition events.
> CC [~VenuReddy], [~hemanth619]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to