On Apr 27, 2012, at 4:23 AM, Felix Meschberger wrote:

> Hi,
> 
> Am 20.04.2012 um 00:19 schrieb David Jencks:
> 
>> We've run into one definite concurrency problem in SCR and I've been 
>> discussing offline with a colleague how to fix it and wanted to get the 
>> discussion out in the open.
>> 
>> The original symptom was when 2 mandatory service refs were satisfied on 
>> different threads at once: the 2nd wasn't recognized so the component never 
>> got activated.
>> 
>> This is easily solved by synchronizing but this introduces risk of deadlocks 
>> (my first attempt, 
>> https://issues.apache.org/jira/secure/attachment/12522537/FELIX-3456-1.diff)
> 
> Yes
> 
>> 
>> We tried some partly asynchronous approaches such as 
>> https://issues.apache.org/jira/secure/attachment/12523313/FELIX-3456-4.diff. 
>>  Unless there's a timeout (presumably due to deadlock) this gets all service 
>> events processed before the thread exits from its first call into SCR.  
>> However this can result in service events getting processed later than one 
>> expects possibly on a different thread.  On further thought we concluded 
>> that a service event must be processed fully before the service registration 
>> call returns.  We therefore don't think any kind of asynchronous approach 
>> will work.
> 
> Yes. For activation it might cause SCR to not terminate processing before the 
> synchronous bundle event handling ends. More importantly, though, unbinding 
> services must be handled synchronously to prevent errors in the components 
> caused by SCR calling the unbind methods when the bound service object is 
> already invalid.
> 
> 
>> 
>> We've discovered the anti-circular-dependency clause in the spec (112.3.5) 
>> but it appears to be overly biased towards SCR-only graphs of services.  We 
>> are leaning towards thinking that SCR also needs to consider:
>> 
>> - an activate method registers a service that satisfies an optional 
>> dependency of a component being activated by scr on the same thread.
>> - the same, except the activate method starts a new thread to register the 
>> service and waits for it to complete.
>> 
> 
> You can come up with lots of scnearios here. Thing is always, that an event 
> may happen for the component to be processed while its state is changing. 
> This is particularly problematic during activation and deactivation (due to 
> missing dependencies).
> 
>> Another scenario to consider is
>> 
>> components C1 and C2 registering as services, each with an optional dynamic 
>> dependency on the other.  If one starts, and then the other, there is no 
>> problem, they both get references to the other.  If they both start at the 
>> same time in separate threads (either because they are in different bundles 
>> or because they get activated due to mandatory references being satisfied) 
>> and register the services while the other is in the Activating state, a 
>> simple lock over the service event processing will result in deadlock.  
>> Furthermore, to get the correct result, at least one of the services has to 
>> be bound while the component to which is is binding is in the Activating 
>> state.
> 
> Dynamic binding of optional services is not a big issue. Because this is 
> known to happen at any time and because such events are fully processed 
> calling the bind and unbind methods even during activation.
> 
>> 
>> It looks like the situation can be simplified a bit by considering, for 
>> service events, whether the dependency will result in a state change: if 
>> it's optional or mandatory but not the only satisfying service, it won't, 
>> but if it's mandatory and the first satisfying service, it will.  We can 
>> calculate this before calling any bind methods or activate methods.  After 
>> determining this, we know the final state of the component.
> 
> SCR already does this but it only considers the impact of the single 
> reference. It does not take any other references into account.
> 
>> 
>> We're considering whether some kind of 2-stage lock would work:
>> 
>> one level can change the state and blocks all other threads
>> the other level can't change the state and lets stuff like service events 
>> for non-state-changing service references be processed according to the 
>> final state of the component. (e.g. activating will let bind methods be 
>> called on the under-configuration object).
>> 
>> This does not yet consider bundle event driven state changes or deactivation 
>> or delayed component creation or service factories.
>> 
>> Comments and more scenarios to consider are more than welcome.
> 
> I would rather come back to a proposal I already made on the bug:
> 
> If a service or configuration event takes place while the component is in the 
> transient activating state, the event is placed into a special queue for 
> further processing. When the transient state is existing, the queue is 
> checked for further actions to take place.
> 
> There is only a small number of situations:
> 
>   * Service added: This must be handled
>   * Service removed: Might deactivate the component immediately.
>   * Config update or delete: Might deactivate the component
> 
> The problem here is the removal of a service while the component is being 
> activated. When we queue this event and handle it later the service has 
> already gone and will be in an undefined/unusable state causing problems. But 
> there is probably not much we can do about this beause the component might be 
> in the activate method and synchronizing at this point in time is risking 
> deadlocks.
> 
> Thus, I think the queue for post processing while in activating state sounds 
> like the most sensible thing to do (with some small remaning window for 
> things going wrong). This is as easy as implementing the deactivate and 
> activate methods in the Activating state to enqeue these requests.

I think this is what's implemented in 
https://issues.apache.org/jira/secure/attachment/12523061/FELIX-3456-3.diff.  I 
just don't think it works.  Either you return from some events without having 
done the work promised or you do it in a different order than expected.  Either 
way you're better off with locks and possible timeouts and exceptions.

still thinking...
david jencks

> 
> Regards
> Felix

Reply via email to