On Thu, Jan 21, 2010 at 10:57 AM, ant elder <[email protected]> wrote: > On Thu, Jan 21, 2010 at 10:37 AM, Simon Laws <[email protected]> > wrote: >> At the moment the code follows approach 2. However it's in no way >> configurable. It was initially introduced in this way because of a >> number of test cases that were failing when following these steps. >> >> test with multiple nodes starts >> client node fires a message into a component using SCA API >> In parallel the node with the service is coming up and populating the >> registry >> reference is asked to send a message to the service before the service >> endpoint has been replicated around to all nodes >> >> Hence there is code in the tribes version of the replicated endpoint >> registry. >> >> The nature of working in this distributed world is that we will always >> encounter these failure cases. I'd like to maintain some configurable >> resilience for the most common failures. Without it you may (or may >> not) need to include retry logic around every use of a reference >> depending on how that reference is wired which doesn't seem ideal. >> >> Simon >> > > IMHO it would be far better to have the delay in the testcase as that > is where it knows what its trying to do. Nodes are going to come and > go over the lifetime of a domain and all the different bindings handle > timeouts or non-existent services in their own way. If a service isn't > available for a reference at the time of a request then the request > should just fail, i don't see much success in having the runtime try > to paper over that. Something which would be useful is having the > runtime be more in control of the target endpoint being used so that > each binding invoker its kept up-to-date as endpoints are updated in > the endpoint registry so that bindings don't make requests to > out-of-date endpoints. >
That said... the Tuscany itest for the calculator-rmi sample has started failing a lot on Hudson because when Hudson is heavily loaded it can take so long to start up the service and reference contributions that even with a 25 second delay the service still isn't always up by the time the reference is invoked. I guess some way to configure retries would help with this, though as its supposed to be a trivial introduction sample it wouldn't be great to have to introduce a lot of policy and intents into it if that was the approach taken to configure the retires. ...ant
