clintropolis opened a new pull request #7234: lifecycle stage refactor to ensure proper start and stop ordering of servers and announcements URL: https://github.com/apache/incubator-druid/pull/7234 This fixes an issue introduced by #7215. `HttpServerInventoryView` uses the 'discovery' announcement so would correctly identify servers that disappear, but `AbstractCuratorServerInventoryView` watches another announcement that stops later, causing the 'server disappeared' event to happen after jetty has stopped. This PR resolves this issue by slightly reworking the `Lifecycle.Stage` enumeration and renaming them to be more relevant to their intended usage, to quote the javadocs: >Conceptually, the stages have the following purposes: >- Stage.INIT: Currently, this stage is used exclusively for log4j initialization, since almost everything needs logging and it should be the last thing to shutdown. Any sort of bootstrapping object that provides something that should be initialized before nearly all other Lifecycle objects could also belong here (if it doesn't need logging during start or stop). >- Stage.NORMAL: This is the default stage. Most objects will probably make the most sense to be registered at this level, with the exception of any form of server or service announcements >- Stage.SERVER: This lifecycle stage is intended for all 'server' objects, and currently only contains the Jetty module, but any sort of 'server' that expects most Lifecycle objects to be initialized by the time it starts, and still available at the time it stops can logically live in this stage. >- Stage.ANNOUNCENTS: Any object which announces to a cluster this servers location belongs in this stage. By being last, we can be sure that all servers are initialized before we advertise the endpoint locations, and also can be sure that we un-announce these advertisements prior to the Stage.SERVER objects stop. This has the very nice behavior that all announcements can be unannounced prior to stopping the server, meaning `druid.server.http.unannouncePropagationDelay` should apply to all announcements to give adequate time for the rest of the cluster to notice all announcements are gone. Logs of shutdown after this PR for curator based segment management: ``` historical: 2019-03-11T22:03:59,561 INFO [Thread-60] org.apache.druid.java.util.common.lifecycle.Lifecycle - Lifecycle [module] running shutdown hook 2019-03-11T22:03:59,563 INFO [Thread-60] org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle [module] stage [ANNOUNCEMENTS] 2019-03-11T22:03:59,565 INFO [Thread-60] org.apache.druid.curator.discovery.CuratorDruidNodeAnnouncer - Unannouncing [DiscoveryDruidNode{druidNode=DruidNode{serviceName='druid/historical', host='localhost', bindOnHost=false, port=-1, plaintextPort=8083, enablePlaintextPort=true, tlsPort=-1, enableTlsPort=false}, nodeType='HISTORICAL', services={dataNodeService=DataNodeService{tier='_default_tier', maxSize=10000000000, type=historical, priority=0}, lookupNodeService=LookupNodeService{lookupTier='__default'}}}]. 2019-03-11T22:03:59,565 INFO [Thread-60] org.apache.druid.curator.announcement.Announcer - unannouncing [/druid/internal-discovery/HISTORICAL/localhost:8083] 2019-03-11T22:03:59,587 INFO [Thread-60] org.apache.druid.curator.discovery.CuratorDruidNodeAnnouncer - Unannounced [DiscoveryDruidNode{druidNode=DruidNode{serviceName='druid/historical', host='localhost', bindOnHost=false, port=-1, plaintextPort=8083, enablePlaintextPort=true, tlsPort=-1, enableTlsPort=false}, nodeType='HISTORICAL', services={dataNodeService=DataNodeService{tier='_default_tier', maxSize=10000000000, type=historical, priority=0}, lookupNodeService=LookupNodeService{lookupTier='__default'}}}]. 2019-03-11T22:03:59,587 INFO [Thread-60] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.curator.announcement.Announcer.stop()] on object[org.apache.druid.curator.announcement.Announcer@11a3a45f]. 2019-03-11T22:03:59,587 INFO [Thread-60] org.apache.druid.curator.announcement.Announcer - Stopping announcer 2019-03-11T22:03:59,589 INFO [Thread-60] org.apache.druid.curator.announcement.Announcer - unannouncing [/druid/announcements/localhost:8083] 2019-03-11T22:03:59,592 INFO [Thread-60] org.apache.druid.curator.announcement.Announcer - unannouncing [/druid/listeners/lookups/__default/http:localhost:8083] broker (curator segment discovery): 2019-03-11T22:03:59,592 INFO [ServerInventoryView-0] org.apache.druid.client.BatchServerInventoryView - Server Disappeared[DruidServerMetadata{name='localhost:8083', hostAndPort='localhost:8083', hostAndTlsPort='null', maxSize=10000000000, tier='_default_tier', type=historical, priority=0}] 2019-03-11T22:03:59,593 INFO [BrokerServerView-0] org.apache.druid.sql.calcite.schema.DruidSchema - Removed all metadata for dataSource[wikiticker]. historical: 2019-03-11T22:03:59,595 INFO [Thread-60] org.apache.druid.curator.announcement.Announcer - unannouncing [/druid/segments/localhost:8083/localhost:8083_historical__default_tier_2019-03-11T22:03:25.234Z_faed3053e88042fcac4096395c3a58d70] 2019-03-11T22:03:59,602 INFO [Thread-60] org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle [module] stage [SERVER] 2019-03-11T22:03:59,602 INFO [Thread-60] org.apache.druid.server.initialization.jetty.JettyServerModule - Waiting 15000 ms for unannouncement to propagate. 2019-03-11T22:04:14,603 INFO [Thread-60] org.apache.druid.server.initialization.jetty.JettyServerModule - Stopping Jetty Server... 2019-03-11T22:04:14,614 INFO [Thread-60] org.eclipse.jetty.server.AbstractConnector - Stopped ServerConnector@1efac5b9{HTTP/1.1,[http/1.1]}{0.0.0.0:8083} 2019-03-11T22:04:14,614 INFO [Thread-60] org.eclipse.jetty.server.session - node0 Stopped scavenging 2019-03-11T22:04:14,616 INFO [Thread-60] org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e.j.s.ServletContextHandler@7f0f84d4{/,null,UNAVAILABLE} 2019-03-11T22:04:14,619 INFO [Thread-60] org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle [module] stage [NORMAL] 2019-03-11T22:04:14,620 INFO [Thread-60] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.server.listener.announcer.ListenerResourceAnnouncer.stop()] on object[org.apache.druid.query.lookup.LookupResourceListenerAnnouncer@115ca7de]. 2019-03-11T22:04:14,620 INFO [Thread-60] org.apache.druid.server.listener.announcer.ListenerResourceAnnouncer - Unannouncing start time on [/druid/listeners/lookups/__default/http:localhost:8083] 2019-03-11T22:04:14,620 INFO [Thread-60] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.query.lookup.LookupReferencesManager.stop()] on object[org.apache.druid.query.lookup.LookupReferencesManager@5ad1904f]. 2019-03-11T22:04:14,620 INFO [Thread-60] org.apache.druid.query.lookup.LookupReferencesManager - LookupReferencesManager is stopping. 2019-03-11T22:04:14,620 INFO [LookupReferencesManager-MainThread] org.apache.druid.query.lookup.LookupReferencesManager - Lookup Management loop exited, Lookup notices are not handled anymore. 2019-03-11T22:04:14,620 INFO [Thread-60] org.apache.druid.query.lookup.LookupReferencesManager - Closing lookup [simple-wiki] 2019-03-11T22:04:14,620 INFO [Thread-60] org.apache.druid.server.lookup.namespace.cache.CacheScheduler - Closing namespace [UriExtractionNamespace{uri=file:/Users/clint/workspace/data/druid/lookups/wiki-simple.json, uriPrefix=null, namespaceParseSpec=ObjectMapperFlatDataParser{}, fileRegex='null', pollPeriod=PT5M}] : org.apache.druid.server.lookup.namespace.cache.CacheScheduler$EntryImpl@3ec01ef5 2019-03-11T22:04:14,620 INFO [Thread-60] org.apache.druid.query.lookup.LookupReferencesManager - LookupReferencesManager is stopped. 2019-03-11T22:04:14,620 INFO [Thread-60] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.discovery.DruidLeaderClient.stop()] on object[org.apache.druid.discovery.DruidLeaderClient@642a16aa]. 2019-03-11T22:04:14,620 INFO [Thread-60] org.apache.druid.discovery.DruidLeaderClient - Stopped. 2019-03-11T22:04:14,621 INFO [Thread-60] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.curator.discovery.ServerDiscoverySelector.stop() throws java.io.IOException] on object[org.apache.druid.curator.discovery.ServerDiscoverySelector@77f991c]. 2019-03-11T22:04:14,622 INFO [Thread-60] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.curator.discovery.CuratorDruidNodeDiscoveryProvider.stop()] on object[org.apache.druid.curator.discovery.CuratorDruidNodeDiscoveryProvider@6968c1d6]. 2019-03-11T22:04:14,623 INFO [Thread-60] org.apache.druid.curator.discovery.CuratorDruidNodeDiscoveryProvider - stopping 2019-03-11T22:04:14,623 INFO [Thread-60] org.apache.druid.curator.discovery.CuratorDruidNodeDiscoveryProvider - stopped 2019-03-11T22:04:14,623 INFO [Thread-60] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.java.util.http.client.NettyHttpClient.stop()] on object[org.apache.druid.java.util.http.client.NettyHttpClient@27abb6ca]. 2019-03-11T22:04:14,641 INFO [Thread-60] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.server.coordination.ZkCoordinator.stop()] on object[org.apache.druid.server.coordination.ZkCoordinator@6413d7e7]. 2019-03-11T22:04:14,641 INFO [Thread-60] org.apache.druid.server.coordination.ZkCoordinator - Stopping ZkCoordinator for [DruidServerMetadata{name='localhost:8083', hostAndPort='localhost:8083', hostAndTlsPort='null', maxSize=10000000000, tier='_default_tier', type=historical, priority=0}] 2019-03-11T22:04:14,641 INFO [Thread-60] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.server.coordination.SegmentLoadDropHandler.stop()] on object[org.apache.druid.server.coordination.SegmentLoadDropHandler@6c479fdf]. 2019-03-11T22:04:14,641 INFO [Thread-60] org.apache.druid.server.coordination.SegmentLoadDropHandler - Stopping... 2019-03-11T22:04:14,642 INFO [Thread-60] org.apache.druid.server.coordination.SegmentLoadDropHandler - Stopped. 2019-03-11T22:04:14,642 INFO [Thread-60] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.java.util.metrics.MonitorScheduler.stop()] on object[org.apache.druid.java.util.metrics.MonitorScheduler@3f5156a6]. 2019-03-11T22:04:14,642 INFO [Thread-60] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.client.cache.CaffeineCache.close() throws java.io.IOException] on object[org.apache.druid.client.cache.CaffeineCache@2b867cd3]. 2019-03-11T22:04:14,644 INFO [Thread-60] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.java.util.emitter.service.ServiceEmitter.close() throws java.io.IOException] on object[ServiceEmitter{serviceDimensions={service=druid/historical, host=localhost:8083, version=}, emitter=LoggingEmitter{log=Logger{name=[org.apache.druid.java.util.emitter.core.LoggingEmitter], class[class org.apache.logging.slf4j.Log4jLogger]}, level=INFO}}]. 2019-03-11T22:04:14,644 INFO [Thread-60] org.apache.druid.java.util.emitter.core.LoggingEmitter - Close: started [false] 2019-03-11T22:04:14,644 INFO [Thread-60] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.java.util.emitter.core.LoggingEmitter.close()] on object[LoggingEmitter{log=Logger{name=[org.apache.druid.java.util.emitter.core.LoggingEmitter], class[class org.apache.logging.slf4j.Log4jLogger]}, level=INFO}]. 2019-03-11T22:04:14,644 INFO [Thread-60] org.apache.druid.curator.CuratorModule - Stopping Curator 2019-03-11T22:04:14,645 INFO [Curator-Framework-0] org.apache.curator.framework.imps.CuratorFrameworkImpl - backgroundOperationsLoop exiting 2019-03-11T22:04:14,654 INFO [Thread-60] org.apache.zookeeper.ZooKeeper - Session: 0x10003528b320020 closed 2019-03-11T22:04:14,654 INFO [main-EventThread] org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x10003528b320020 2019-03-11T22:04:14,654 INFO [Thread-60] org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle [module] stage [INIT] 2019-03-11T22:04:14,655 INFO [Thread-60] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.initialization.Log4jShutterDownerModule$Log4jShutterDowner.stop()] on object[org.apache.druid.initialization.Log4jShutterDownerModule$Log4jShutterDowner@2d760326]. ``` and for http segment management: ``` historical: 2019-03-11T22:07:29,006 INFO [Thread-60] org.apache.druid.java.util.common.lifecycle.Lifecycle - Lifecycle [module] running shutdown hook 2019-03-11T22:07:29,008 INFO [Thread-60] org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle [module] stage [ANNOUNCEMENTS] 2019-03-11T22:07:29,010 INFO [Thread-60] org.apache.druid.curator.discovery.CuratorDruidNodeAnnouncer - Unannouncing [DiscoveryDruidNode{druidNode=DruidNode{serviceName='druid/historical', host='localhost', bindOnHost=false, port=-1, plaintextPort=8083, enablePlaintextPort=true, tlsPort=-1, enableTlsPort=false}, nodeType='HISTORICAL', services={dataNodeService=DataNodeService{tier='_default_tier', maxSize=10000000000, type=historical, priority=0}, lookupNodeService=LookupNodeService{lookupTier='__default'}}}]. 2019-03-11T22:07:29,010 INFO [Thread-60] org.apache.druid.curator.announcement.Announcer - unannouncing [/druid/internal-discovery/HISTORICAL/localhost:8083] 2019-03-11T22:07:29,034 INFO [Thread-60] org.apache.druid.curator.discovery.CuratorDruidNodeAnnouncer - Unannounced [DiscoveryDruidNode{druidNode=DruidNode{serviceName='druid/historical', host='localhost', bindOnHost=false, port=-1, plaintextPort=8083, enablePlaintextPort=true, tlsPort=-1, enableTlsPort=false}, nodeType='HISTORICAL', services={dataNodeService=DataNodeService{tier='_default_tier', maxSize=10000000000, type=historical, priority=0}, lookupNodeService=LookupNodeService{lookupTier='__default'}}}]. 2019-03-11T22:07:29,034 INFO [Thread-60] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.curator.announcement.Announcer.stop()] on object[org.apache.druid.curator.announcement.Announcer@5a4dda2]. 2019-03-11T22:07:29,034 INFO [Thread-60] org.apache.druid.curator.announcement.Announcer - Stopping announcer 2019-03-11T22:07:29,036 INFO [Thread-60] org.apache.druid.curator.announcement.Announcer - unannouncing [/druid/announcements/localhost:8083] broker (http segment discovery): 2019-03-11T22:07:29,032 INFO [CuratorDruidNodeDiscoveryProvider-ListenerExecutor] org.apache.druid.client.HttpServerInventoryView - Server[localhost:8083] disappeared. 2019-03-11T22:07:29,032 INFO [CuratorDruidNodeDiscoveryProvider-ListenerExecutor] org.apache.druid.server.coordination.ChangeRequestHttpSyncer - Stopped ChangeRequestHttpSyncer[http://localhost:8083/_1552342037182]. 2019-03-11T22:07:29,038 INFO [BrokerServerView-0] org.apache.druid.sql.calcite.schema.DruidSchema - Removed all metadata for dataSource[wikiticker]. historical: 2019-03-11T22:07:29,039 INFO [Thread-60] org.apache.druid.curator.announcement.Announcer - unannouncing [/druid/listeners/lookups/__default/http:localhost:8083] 2019-03-11T22:07:29,042 INFO [Thread-60] org.apache.druid.curator.announcement.Announcer - unannouncing [/druid/segments/localhost:8083/localhost:8083_historical__default_tier_2019-03-11T22:07:05.488Z_02242ea50c0f44fda31d2379ca1db7260] 2019-03-11T22:07:29,045 INFO [Thread-60] org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle [module] stage [SERVER] 2019-03-11T22:07:29,045 INFO [Thread-60] org.apache.druid.server.initialization.jetty.JettyServerModule - Waiting 15000 ms for unannouncement to propagate. 2019-03-11T22:07:44,049 INFO [Thread-60] org.apache.druid.server.initialization.jetty.JettyServerModule - Stopping Jetty Server... 2019-03-11T22:07:44,061 INFO [Thread-60] org.eclipse.jetty.server.AbstractConnector - Stopped ServerConnector@7e5efcab{HTTP/1.1,[http/1.1]}{0.0.0.0:8083} ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
