Quanlong Huang created IMPALA-12561:
---------------------------------------
Summary: Event-processor shouldn't go into ERROR state for
failures in fetching events
Key: IMPALA-12561
URL: https://issues.apache.org/jira/browse/IMPALA-12561
Project: IMPALA
Issue Type: Bug
Components: Catalog
Reporter: Quanlong Huang
Assignee: Quanlong Huang
Since IMPALA-8240, we allow event-processor to retry for
MetastoreNotificationFetchExceptions. However, there are several places that we
haven't converted HMS failures in fetching events into
MetastoreNotificationFetchExceptions:
1. getNextMetastoreEvents() throws IllegalStateException if it fails to create
a MetaStoreClient.
{noformat}
E1024 05:00:58.458434 258 MetastoreEventsProcessor.java:888] Unexpected
exception received while processing event
Java exception follows:
java.lang.IllegalStateException: java.lang.RuntimeException: Unable to
instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient at
org.apache.impala.catalog.MetaStoreClientPool$MetaStoreClient.<init>(MetaStoreClientPool.java:105)
at
org.apache.impala.catalog.MetaStoreClientPool$MetaStoreClient.<init>(MetaStoreClientPool.java:78)
at
org.apache.impala.catalog.MetaStoreClientPool.getClient(MetaStoreClientPool.java:205)
at
org.apache.impala.catalog.Catalog.getMetaStoreClient(Catalog.java:397) at
org.apache.impala.catalog.events.MetastoreEventsProcessor.getNextMetastoreEvents(MetastoreEventsProcessor.java:802)
at
org.apache.impala.catalog.events.MetastoreEventsProcessor.getNextMetastoreEvents(MetastoreEventsProcessor.java:848)
at
org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:869)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)Caused by:
java.lang.RuntimeException: Unable to instantiate
org.apache.hadoop.hive.metastore.HiveMetaStoreClient at
org.apache.hadoop.hive.metastore.utils.JavaUtils.newInstance(JavaUtils.java:86)
at
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:98)
at
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:151)
at
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:122)
at
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:115)
at
org.apache.impala.catalog.MetaStoreClientPool$MetaStoreClient.<init>(MetaStoreClientPool.java:99)
... 13 moreCaused by: java.lang.reflect.InvocationTargetException at
sun.reflect.GeneratedConstructorAccessor948.newInstance(Unknown Source) at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at
org.apache.hadoop.hive.metastore.utils.JavaUtils.newInstance(JavaUtils.java:84)
... 18 moreCaused by: MetaException(message:Could not connect to meta
store using any of the URIs provided. Most recent failure:
org.apache.thrift.transport.TTransportException: Peer indicated failure:
Failure to initialize security context at
org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:171)
at
org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:244) at
org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:39)
at
org.apache.hadoop.hive.metastore.security.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:51)
at
org.apache.hadoop.hive.metastore.security.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:48)
at java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:422) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
at
org.apache.hadoop.hive.metastore.security.TUGIAssumingTransport.open(TUGIAssumingTransport.java:48)
at
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:758)
at
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:271)
at sun.reflect.GeneratedConstructorAccessor948.newInstance(Unknown Source)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at
org.apache.hadoop.hive.metastore.utils.JavaUtils.newInstance(JavaUtils.java:84)
at
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:98)
at
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:151)
at
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:122)
at
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:115)
at
org.apache.impala.catalog.MetaStoreClientPool$MetaStoreClient.<init>(MetaStoreClientPool.java:99)
at
org.apache.impala.catalog.MetaStoreClientPool$MetaStoreClient.<init>(MetaStoreClientPool.java:78)
at
org.apache.impala.catalog.MetaStoreClientPool.getClient(MetaStoreClientPool.java:205)
at
org.apache.impala.catalog.Catalog.getMetaStoreClient(Catalog.java:397) at
org.apache.impala.catalog.events.MetastoreEventsProcessor.getNextMetastoreEvents(MetastoreEventsProcessor.java:802)
at
org.apache.impala.catalog.events.MetastoreEventsProcessor.getNextMetastoreEvents(MetastoreEventsProcessor.java:848)
at
org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:869)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
)
at
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:829)
at
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:271)
... 22 more
{noformat}
2. processEvents() doesn't handle the failures of getCurrentEventId() as
MetastoreNotificationFetchExceptions. Instead, getCurrentEventId() throws
CatalogException:
{code}
E1114 16:01:11.121475 28921 MetastoreEventsProcessor.java:942] Unexpected
exception received while processing event
Java exception follows:
org.apache.impala.catalog.CatalogException: Unable to fetch the current
notification event id. Check if metastore service is accessible
at
org.apache.impala.catalog.events.MetastoreEventsProcessor.getCurrentEventId(MetastoreEventsProcessor.java:744)
at
org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:922)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
E1114 16:01:11.136973 28921 MetastoreEventsProcessor.java:1190] Notification
event is null
W1114 16:01:11.137122 28921 MetastoreEventsProcessor.java:913] Event processing
is skipped since status is ERROR. Last synced event id is 8406252
{code}
Event-processor should distinguish these HMS errors and don't go into the ERROR
state. So it can retry until the connection to HMS is back to normal.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]