[ https://issues.apache.org/jira/browse/CURATOR-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17846280#comment-17846280 ]
Roelof Naude commented on CURATOR-705: -------------------------------------- have managed to isolate the root cause. this is due to using a custom ThreadPoolExecutor as run safe service. the ThreadPoolExecutor is fixed size using 2 threads. the same test repeated using Executors.newFixedThreadPool had the same failure. we picked this up due to a dead lock caused by the default, single threaded run safe service. one of our apps is notified using the default run safe service and attempts to perform service discovery. the service discovery is run in the context of runSafeService. subsequent events are blocked waiting for the initial event to complete, which cannot, as it is blocked in service discovery. PathChildrenCache used to run these events in a separate executor. specifying an executor for ServiceProviderBuilder has no effect and results in a warning being logged: {code:java} CuratorCache does not support custom ExecutorService{code} for now we run the event in a separate thread, but this is not ideal. > ServiceCache::getInstances do not return any instances > ------------------------------------------------------ > > Key: CURATOR-705 > URL: https://issues.apache.org/jira/browse/CURATOR-705 > Project: Apache Curator > Issue Type: Bug > Components: General > Affects Versions: 5.6.0 > Environment: linux, java 21 > Reporter: Roelof Naude > Priority: Major > Attachments: curator-x-discovery.patch, > testInitialLoadUsingExecutor.patch > > > we've run into an issue with service discovery after upgrading from 4.3.0 to > 5.6.0. > ServiceCache::getInstances do not return any instances of a service. > restarting the back-end services do allow ServiceCache to detect the > instance. have managed to simulate this scenario in the test cases. > > TestServiceCache::testInitialLoad was modified to register instance1 before > the cache has been started. an assert immediately after cache.start detects > the failure, ie: > {code:java} > assertEquals(1, cache.getInstances().size());{code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)