[
https://issues.apache.org/jira/browse/SENTRY-1753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16289816#comment-16289816
]
Liam Sargent commented on SENTRY-1753:
--------------------------------------
[~akolb] The point of the exponential backoff is to reduce log messages.
bq. What is the value of any delay other then zero? Why do you want tto delay
connection attempts? This just increases startup latency when Sentry restarts
after reboot.
This config item currently exists and is set to zero. I left this step in
because it is part of the current pattern.
bq. 2.) Attempt to connect to HMS,
So far so good...
_if fail due to no service, wait configurable initial delay amount and try
again.
_
We want the first retry attempts to be quick, within a few seconds of each
other, but if the service is not responding right away, we should wait a little
longer before trying again. Using exponential backoff means the retries will
not be fired off every half a second. It is not necessary to use exponential
backoff at all, but I think it is a clever way of reducing noise without
sacrificing. The current stack trace for a single HMSFollower connect failure
(which occurs every half second for almost 3 minutes on a full restart) looks
like:
9:45:46.816 AM WARN UserGroupInformation
PriviledgedActionException
as:sentry/[email protected] (auth:KERBEROS)
cause:MetaException(message:Could not connect to meta store using any of the
URIs provided. Most recent failure:
org.apache.thrift.transport.TTransportException: java.net.ConnectException:
Connection refused (Connection refused)
at org.apache.thrift.transport.TSocket.open(TSocket.java:226)
at
org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:266)
at
org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
at
org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
at
org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
at
org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
at
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:472)
at
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:252)
at
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:187)
at
org.apache.sentry.service.thrift.HiveSimpleConnectionFactory$1.run(HiveSimpleConnectionFactory.java:117)
at
org.apache.sentry.service.thrift.HiveSimpleConnectionFactory$1.run(HiveSimpleConnectionFactory.java:114)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
at
org.apache.sentry.service.thrift.HiveSimpleConnectionFactory.connect(HiveSimpleConnectionFactory.java:114)
at
org.apache.sentry.service.thrift.SentryHMSClient.connect(SentryHMSClient.java:106)
at
org.apache.sentry.service.thrift.HMSFollower.syncupWithHms(HMSFollower.java:177)
at
org.apache.sentry.service.thrift.HMSFollower.run(HMSFollower.java:150)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at org.apache.thrift.transport.TSocket.open(TSocket.java:221)
... 27 more
)
9:45:46.816 AM ERROR HMSFollower
HMSFollower cannot connect to HMS!!
java.lang.reflect.UndeclaredThrowableException
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1930)
at
org.apache.sentry.service.thrift.HiveSimpleConnectionFactory.connect(HiveSimpleConnectionFactory.java:114)
at
org.apache.sentry.service.thrift.SentryHMSClient.connect(SentryHMSClient.java:106)
at
org.apache.sentry.service.thrift.HMSFollower.syncupWithHms(HMSFollower.java:177)
at
org.apache.sentry.service.thrift.HMSFollower.run(HMSFollower.java:150)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: MetaException(message:Could not connect to meta store using any of
the URIs provided. Most recent failure:
org.apache.thrift.transport.TTransportException: java.net.ConnectException:
Connection refused (Connection refused)
at org.apache.thrift.transport.TSocket.open(TSocket.java:226)
at
org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:266)
at
org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
at
org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
at
org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
at
org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
at
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:472)
at
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:252)
at
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:187)
at
org.apache.sentry.service.thrift.HiveSimpleConnectionFactory$1.run(HiveSimpleConnectionFactory.java:117)
at
org.apache.sentry.service.thrift.HiveSimpleConnectionFactory$1.run(HiveSimpleConnectionFactory.java:114)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
at
org.apache.sentry.service.thrift.HiveSimpleConnectionFactory.connect(HiveSimpleConnectionFactory.java:114)
at
org.apache.sentry.service.thrift.SentryHMSClient.connect(SentryHMSClient.java:106)
at
org.apache.sentry.service.thrift.HMSFollower.syncupWithHms(HMSFollower.java:177)
at
org.apache.sentry.service.thrift.HMSFollower.run(HMSFollower.java:150)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at org.apache.thrift.transport.TSocket.open(TSocket.java:221)
... 27 more
)
at
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:520)
at
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:252)
at
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:187)
at
org.apache.sentry.service.thrift.HiveSimpleConnectionFactory$1.run(HiveSimpleConnectionFactory.java:117)
at
org.apache.sentry.service.thrift.HiveSimpleConnectionFactory$1.run(HiveSimpleConnectionFactory.java:114)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
... 11 more
So, we could either reduce the number of failure messages in the average case
by backing off retry time, or we could make the log message less verbose.
> Make HMSFollower initial delay configurable
> -------------------------------------------
>
> Key: SENTRY-1753
> URL: https://issues.apache.org/jira/browse/SENTRY-1753
> Project: Sentry
> Issue Type: Task
> Components: Sentry
> Affects Versions: 2.0.0
> Reporter: Na Li
> Assignee: Liam Sargent
> Priority: Minor
>
> HMSFollower initial delay affects how fast it can get notification from Hive
> meta store. We are changing it from 60 seconds to 0 second. It may cause some
> issue. It is better to make this value configurable, so we can adjust it
> based on testing result.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)