[ 
https://issues.apache.org/jira/browse/ODE-647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787424#action_12787424
 ] 

Tammo van Lessen commented on ODE-647:
--------------------------------------

I could reproduce this bug on ODE-trunk even with Axis 1.4.1. The issue is 
slightly different but still occurs. During a long debug session last night I 
could isolate the issue. See my findings below:

The issue has been introduced by r790694 which basically fixes a memory leak 
with Axis2. Without this change set, Axis2 didn't complain about an existing 
service with the same name, instead it removes the old ServiceClient and 
registeres the new one with a new ServiceGroupContext. This is probably leaking 
some memory.

The bug itself is a synchronization issue in SoapExternalService caused by a 
static thread-local object that serves as a cache for ServiceClient objects. 
SoapExternalServices represent external partner services and are called by the 
engine to invoke those partners. The invocation itself is deferred and may be 
executed concurrently by multiple threads (pooled by an ExecutorService). See 
also ODE-382. So theoretically it is possible to have two concurrent workers 
calling the same partner with different (or even the same) operation, hence the 
workers should not use the same axis2 service client as this would cause 
strange side-effects. From what I could get from the class' history, this 
scenario is the reason why a thread-local object has been introduced.

Back to bug: Having a thread-local cache basically means that each worker has 
it's own cache. Unfortunately, we don't route external invocations to the same 
thread but instead get randomly one from the pool.

Assume we have 2 of x threads {t1, t2} and 2 services {s1, s2}.
1. ODE calls s1, the pool chooses t1: (t1, s1).invoke
2. ODE calls s2, the pool chooses t2: (t2, s2).invoke

Now t1 has stored the ServiceClient for s1 in its thread-local cache and t2 the 
ServiceClient for s2

3. ODE calls s1 again, but this time the pool returns t2: (t2, s1).invoke

Now the thread-local cache for t2 returns s2, we run into the if-branch in 
SoapExternalService.getServiceClient(), clean up and discard the 
s2-ServiceClient (why?) and re-create s1, which is still registered with Axis2, 
hence it bitterly complains.

Now the question is, what would be the best fix?

I see two possibilities:
  a) Fix thread-local caching. The current caching strategy is IMO not really 
helpful as it is caching exactly 1 serviceclient instance per thread, and the 
chance that the same thread is interacting with this particular service again 
is not really high (depending on the pool size, /dev/urandom and the current 
weather conditions). An option would be to give each thread a cache set, so 
that each thread can have multiple service clients (in the worst case for each 
SoapExternalService). The services must have different names so that concurring 
workers interacting with the same external service use different service client 
instances. However, we have to find a way to release them again at some point 
in time.
  b) Drop the thread-local caching and synchronize access to the ServiceClient 
associated with SoapExternalService.
  c) you tell me.

As I currently don't know how expensive creation and storage of ServiceClient 
instances is, I'd like to know what you think. a), b), or even c)?

Thanks,
  Tammo


> Multiple consecutive invocations to a service might  incur an axis2.AxisFault 
> of  "two services cannot have same name".
> -----------------------------------------------------------------------------------------------------------------------
>
>                 Key: ODE-647
>                 URL: https://issues.apache.org/jira/browse/ODE-647
>             Project: ODE
>          Issue Type: Bug
>    Affects Versions: 1.3.3
>         Environment: ODE1.3.3, Tomcat 5.5.9, Sun JVM 1.5, Windows XP SP3
>            Reporter: Wenfeng Zhao
>            Assignee: Alexis Midon
>            Priority: Critical
>             Fix For: 1.3.4
>
>         Attachments: ODE-647-outputs.txt, ODE647.zip
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Although version 1.3.2 and 2.0 are OK,   with ODE 1.3.3, it seems that 
> multiple consecutive invocations to a same component service might incur the 
> following exception:
> ERROR - GeronimoLog.error(108) | Error sending message to Axis2 for ODE mex 
> {PartnerRoleMex#hqejbhcnphr4i3dscxlf10 [PID 
> {http://scqr.bupt.edu.cn/solution}process_SyntheticBookService_sol2-12] 
> calling null.operation1(...)}
> org.apache.axis2.AxisFault: Two services cannot have same name.  A service 
> with the 
> axis_service_for_{http://example.org/writerInfo}writerInfoService#writerInfoPort_hqejbhcnphr4i3dscxlf0r
>  name already exists in the system.
>       at 
> org.apache.axis2.client.ServiceClient.configureServiceClient(ServiceClient.java:172)
>       at org.apache.axis2.client.ServiceClient.<init>(ServiceClient.java:139)
>       at 
> org.apache.ode.axis2.SoapExternalService.getServiceClient(SoapExternalService.java:281)
>       at 
> org.apache.ode.axis2.SoapExternalService.invoke(SoapExternalService.java:140)
>       at 
> org.apache.ode.axis2.MessageExchangeContextImpl.invokePartner(MessageExchangeContextImpl.java:52)
>       at 
> org.apache.ode.bpel.engine.BpelRuntimeContextImpl.invoke(BpelRuntimeContextImpl.java:781)
>       at org.apache.ode.bpel.runtime.INVOKE.run(INVOKE.java:100)
>       at sun.reflect.GeneratedMethodAccessor58.invoke(Unknown Source)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>       at java.lang.reflect.Method.invoke(Method.java:585)
>       at 
> org.apache.ode.jacob.vpu.JacobVPU$JacobThreadImpl.run(JacobVPU.java:451)
>       at org.apache.ode.jacob.vpu.JacobVPU.execute(JacobVPU.java:139)
>       at 
> org.apache.ode.bpel.engine.BpelRuntimeContextImpl.execute(BpelRuntimeContextImpl.java:875)
>       at 
> org.apache.ode.bpel.engine.BpelProcess.handleWorkEvent(BpelProcess.java:438)
>       at 
> org.apache.ode.bpel.engine.BpelEngineImpl.onScheduledJob(BpelEngineImpl.java:439)
>       at 
> org.apache.ode.bpel.engine.BpelServerImpl.onScheduledJob(BpelServerImpl.java:441)
>       at 
> org.apache.ode.scheduler.simple.SimpleScheduler$4$1.call(SimpleScheduler.java:411)
>       at 
> org.apache.ode.scheduler.simple.SimpleScheduler$4$1.call(SimpleScheduler.java:405)
>       at 
> org.apache.ode.scheduler.simple.SimpleScheduler.execTransaction(SimpleScheduler.java:218)
>       at 
> org.apache.ode.scheduler.simple.SimpleScheduler$4.call(SimpleScheduler.java:404)
>       at 
> org.apache.ode.scheduler.simple.SimpleScheduler$4.call(SimpleScheduler.java:401)
>       at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:269)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:123)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
>       at java.lang.Thread.run(Thread.java:595)
> And I noted that a similar problem has been discussed in 2007( 
> https://issues.apache.org/jira/browse/AXIS2-1182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12476874
>  ).  But I'm not clear whether there are relations between the two problems.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to