[ 
https://issues.apache.org/jira/browse/CURATOR-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17846280#comment-17846280
 ] 

Roelof Naude commented on CURATOR-705:
--------------------------------------

have managed to isolate the root cause. this is due to using a custom 
ThreadPoolExecutor as run safe service.

 

the ThreadPoolExecutor is fixed size using 2 threads. the same test repeated 
using Executors.newFixedThreadPool had the same failure.

 

we picked this up due to a dead lock caused by the default, single threaded run 
safe service. one of our apps is notified using the default run safe service 
and attempts to perform service discovery. the service discovery is run in the 
context of runSafeService. subsequent events are blocked waiting for the 
initial event to complete, which cannot, as it is blocked in service discovery.

 

PathChildrenCache used to run these events in a separate executor. specifying 
an executor for ServiceProviderBuilder has no effect and results in a warning 
being logged:
{code:java}
CuratorCache does not support custom ExecutorService{code}
 

for now we run the event in a separate thread, but this is not ideal.

> ServiceCache::getInstances do not return any instances
> ------------------------------------------------------
>
>                 Key: CURATOR-705
>                 URL: https://issues.apache.org/jira/browse/CURATOR-705
>             Project: Apache Curator
>          Issue Type: Bug
>          Components: General
>    Affects Versions: 5.6.0
>         Environment: linux, java 21
>            Reporter: Roelof Naude
>            Priority: Major
>         Attachments: curator-x-discovery.patch, 
> testInitialLoadUsingExecutor.patch
>
>
> we've run into an issue with service discovery after upgrading from 4.3.0 to 
> 5.6.0.
> ServiceCache::getInstances do not return any instances of a service. 
> restarting the back-end services do allow ServiceCache to detect the 
> instance. have managed to simulate this scenario in the test cases.
>  
> TestServiceCache::testInitialLoad was modified to register instance1 before 
> the cache has been started. an assert immediately after cache.start detects 
> the failure, ie:
> {code:java}
> assertEquals(1, cache.getInstances().size());{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to