[
https://issues.apache.org/jira/browse/JCLOUDS-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14059309#comment-14059309
]
Ignasi Barrera commented on JCLOUDS-624:
----------------------------------------
Thanks for reporting [~luciano.sabenca]. As you pointed out, the problem is
that the executor that is used to implement the calls with a timeout configured
is the same one that is used to enqueue the {{getNode}} operations, and that is
causing the deadlock.
Instead of adding a new executor to jclouds, however, I'd suggest a different
approach. Since jclouds 1.7 we've been removing many of the asynchronous stuff
in jclouds, and delegating that responsibility to the users. How async is
achieved and/or which executors are used is more about the application logic,
not about jclouds, so we've been deprecating the async stuff, to give the
control to the applications.
Based on this, I'd suggest the following approach to fix the issue. The
{{ListNodes}} strategy already has a method that accepts a custom executor (one
created by the application outside jclouds), so:
* I'd remove the {{execute()}} method from the interface and implementation (it
uses the jclouds "user executor") and leave only the method that explicitly
requires an executor.
* I'd add a method to the {{ChefService}} interface called
{{listNodesConcurrently}} that requires an executor and will delegate to the
corresponding method of the {{ListNodes}} strategy.
* Keep the current {{listNodes()}} method in the {{ChefService}} but perform
the operation in an iterative way, without concurrency.
This way all the async logic is a choice of the client application, and so will
be the executor to be used. WDYT?
It would also be good to apply this same pattern to the rest of strategies, as
there are others that use the same approach using the "jclouds user executor".
> Bug at ListNodes In ChefContext
> -------------------------------
>
> Key: JCLOUDS-624
> URL: https://issues.apache.org/jira/browse/JCLOUDS-624
> Project: jclouds
> Issue Type: Bug
> Components: jclouds-chef
> Affects Versions: 1.8.0
> Reporter: Luciano Sabença
> Assignee: Ignasi Barrera
>
> Hi.
> I've found a bug when using ListNodes in jclouds-chef. The problem is related
> to the thread pool that is used when ListNodes is called.
> When using ListNodes, for each node that exists in Chef, a new task is
> submitted to the pool "USER_THREADS". Each one of these submitted tasks
> generates another Runnable - to do the HTTP request - and submits it to the
> same thread pool. So, when we have a large number of nodes(an usual scenario
> when dealing with Chef), all threads in the pool are blocked waiting its own
> thread responsible to do the HTTP request and no thread is left to
> effectively do the HTTP request. After the configured timeout, a lot of
> java.util.concurrent.TimeoutException are thrown.
> I am already working to fix it and I will release a pull request soon (maybe,
> in a few days).
> Thanks
--
This message was sent by Atlassian JIRA
(v6.2#6252)