[ 
https://issues.apache.org/jira/browse/JCLOUDS-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14059309#comment-14059309
 ] 

Ignasi Barrera commented on JCLOUDS-624:
----------------------------------------

Thanks for reporting [~luciano.sabenca]. As you pointed out, the problem is 
that the executor that is used to implement the calls with a timeout configured 
is the same one that is used to enqueue the {{getNode}} operations, and that is 
causing the deadlock.

Instead of adding a new executor to jclouds, however, I'd suggest a different 
approach. Since jclouds 1.7 we've been removing many of the asynchronous stuff 
in jclouds, and delegating that responsibility to the users. How async is 
achieved and/or which executors are used is more about the application logic, 
not about jclouds, so we've been deprecating the async stuff, to give the 
control to the applications.

Based on this, I'd suggest the following approach to fix the issue. The 
{{ListNodes}} strategy already has a method that accepts a custom executor (one 
created by the application outside jclouds), so:


* I'd remove the {{execute()}} method from the interface and implementation (it 
uses the jclouds "user executor") and leave only the method that explicitly 
requires an executor.
* I'd add a method to the {{ChefService}} interface called 
{{listNodesConcurrently}} that requires an executor and will delegate to the 
corresponding method of the {{ListNodes}} strategy.
* Keep the current {{listNodes()}} method in the {{ChefService}} but perform 
the operation in an iterative way, without concurrency.

This way all the async logic is a choice of the client application, and so will 
be the executor to be used. WDYT?

It would also be good to apply this same pattern to the rest of strategies, as 
there are others that use the same approach using the "jclouds user executor".

> Bug at ListNodes In ChefContext
> -------------------------------
>
>                 Key: JCLOUDS-624
>                 URL: https://issues.apache.org/jira/browse/JCLOUDS-624
>             Project: jclouds
>          Issue Type: Bug
>          Components: jclouds-chef
>    Affects Versions: 1.8.0
>            Reporter: Luciano Sabença
>            Assignee: Ignasi Barrera
>
> Hi.
> I've found a bug when using ListNodes in jclouds-chef. The problem is related 
> to the thread pool that is used when ListNodes is called. 
> When using ListNodes, for each node that exists in Chef, a new task is 
> submitted to the pool "USER_THREADS". Each one of these submitted tasks 
> generates another Runnable - to do the HTTP request -  and submits it to the 
> same thread pool. So, when we have a large number of nodes(an usual scenario 
> when dealing with Chef), all threads in the pool are blocked waiting its own 
> thread responsible to do the HTTP request and no thread is left to 
> effectively do the HTTP request. After the configured timeout, a lot of 
> java.util.concurrent.TimeoutException are thrown.
> I am already working to fix it and I will release a pull request soon (maybe, 
> in a few days).
> Thanks



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to