Re: [Dev] [APPM] Depsync failure due to hazelcast issue

Manoj Kumara Mon, 18 May 2015 02:24:08 -0700

Hi Nuwan,

Please try out with attached patch. This was added on latest kernel
patch0010 as well.

 WSO2-CARBON-PATCH-4.2.0-1297.zip
<https://docs.google.com/a/wso2.com/file/d/0B-yMpNmsyVchNDZDLWVEVFl4VlU/edit?usp=drive_web>

Regards,
Manoj



*Manoj Kumara*
Software Engineer
WSO2 Inc. http://wso2.com/
*lean.enterprise.middleware*
Mobile: +94713448188

On Mon, May 18, 2015 at 2:48 PM, Afkham Azeez <[email protected]> wrote:

> Please upgrade to Hazelcast 3.4.2
>
> On Mon, May 18, 2015 at 9:12 AM, Nuwan Silva <[email protected]> wrote:
>
>> yes, this is continuously reproducible. Anyway will check the configs
>> again and verify.
>>
>> Regards,
>> NuwanS.
>>
>> On Sat, May 16, 2015 at 2:34 PM, Dinusha Senanayaka <[email protected]>
>> wrote:
>>
>>> HI Nuwan/ Asanthi,
>>>
>>> We configured  3-nodes cluster setup to test this scenario. Seems
>>> everything works fine. "Member left" notification propagate to all other
>>> nodes correctly. Also dep-sync works without issue with member left and
>>> joining.
>>>
>>> Anyway, is this issue continuously re-producible in your setup ? Could
>>> see some mismatch in the node that left and node that dep-sync has failed
>>> to connect in your logs.
>>>
>>> Member left [cc9950e3-af57-48b1-9f45-b511f0f30863]: /192.168.48.2:4001
>>> Packet not sent to -> Address[192.168.48.5]:4000
>>>
>>> May be some network issue in the machine caused this issue ?
>>>
>>> Btw, we don't need to add dep-sync for store/publisher. Only to GW
>>> cluster is enough.
>>>
>>> Regards,
>>> Dinusha.
>>>
>>>
>>> On Thu, May 14, 2015 at 5:31 PM, Asanthi Kulasinghe <[email protected]>
>>> wrote:
>>>
>>>> Looping in Azeez and Dinusha
>>>>
>>>>
>>>> On Wed, May 13, 2015 at 2:33 PM, Nuwan Silva <[email protected]> wrote:
>>>>
>>>>> Hi All,
>>>>>
>>>>> I came across the following issue where we see a depsync error when a
>>>>> single node in the cluster goes down and the other nodes not knowing it
>>>>> till its restarted.
>>>>> observation was that when a node (node 1) in the cluster goes down the
>>>>> other nodes (node 2 and node 3) do not know that node (node 1) is down. So
>>>>> it tries to synchronize its artifacts.
>>>>>
>>>>> further we observed that when the node which was down (node 1)
>>>>> restarts. the other nodes find out that node1 was down.
>>>>>
>>>>> TID: [0] [AM] [2015-05-13 09:00:08,861]  INFO
>>>>> {org.wso2.carbon.core.clustering.hazelcast.wka.WKABasedMembershipScheme} -
>>>>> Member left [cc9950e3-af57-48b1-9f45-b511f0f30863]: /192.168.48.2:4001
>>>>> {org.wso2.carbon.core.clustering.hazelcast.wka.WKABasedMembershipScheme}
>>>>> TID: [0] [AM] [2015-05-13 09:00:08,877]  INFO
>>>>> {org.wso2.carbon.core.clustering.hazelcast.wka.WKABasedMembershipScheme} -
>>>>> WKA member Host:192.168.48.2, Remote Host:null, Port: 4001, HTTP:8281,
>>>>> HTTPS:8244, Domain: wso2.am.storepub.domain, Sub-domain:worker, 
>>>>> Active:true
>>>>> left cluster.
>>>>> {org.wso2.carbon.core.clustering.hazelcast.wka.WKABasedMembershipScheme}
>>>>> TID: [0] [AM] [2015-05-13 09:00:08,890]  INFO
>>>>> {org.wso2.carbon.core.clustering.hazelcast.HazelcastClusteringAgent} -
>>>>> Elected this member [10bb1dec-4d25-4966-a39a-eb42e42d1706] as the
>>>>> Coordinator for the cluster [wso2.am.storepub.domain]
>>>>> {org.wso2.carbon.core.clustering.hazelcast.HazelcastClusteringAgent}
>>>>> TID: [0] [AM] [2015-05-13 09:00:14,849]  INFO
>>>>> {org.wso2.carbon.core.clustering.hazelcast.wka.WKABasedMembershipScheme} -
>>>>> Member joined [bc438078-ee97-4643-aadb-cf7ba970dc7e]: /
>>>>> 192.168.48.2:4001
>>>>> {org.wso2.carbon.core.clustering.hazelcast.wka.WKABasedMembershipScheme}
>>>>>
>>>>>
>>>>>
>>>>> *Depsync Error:*
>>>>>
>>>>> TID: [0] [AM] [2015-05-13 08:52:03,333] ERROR
>>>>> {org.wso2.carbon.core.deployment.CarbonDeploymentSchedulerTask} -
>>>>> Deployment synchronization commit for tenant -1234 failed
>>>>> {org.wso2.carbon.core.deployment.CarbonDeploymentSchedulerTask}
>>>>> com.hazelcast.core.HazelcastException:
>>>>> com.hazelcast.spi.exception.RetryableIOException: Packet not sent to ->
>>>>> Address[192.168.48.5]:4000
>>>>>     at com.hazelcast.util.ExceptionUtil.rethrow(ExceptionUtil.java:45)
>>>>>     at com.hazelcast.util.ExceptionUtil.rethrow(ExceptionUtil.java:40)
>>>>>     at
>>>>> com.hazelcast.map.proxy.MapProxySupport.containsKeyInternal(MapProxySupport.java:371)
>>>>>     at
>>>>> com.hazelcast.map.proxy.MapProxyImpl.containsKey(MapProxyImpl.java:215)
>>>>>     at
>>>>> org.wso2.carbon.core.clustering.hazelcast.HazelcastDistributedMapProvider$DistMap.containsKey(HazelcastDistributedMapProvider.java:119)
>>>>>     at
>>>>> org.wso2.carbon.caching.impl.CacheImpl.containsKey(CacheImpl.java:237)
>>>>>     at
>>>>> org.wso2.carbon.registry.core.caching.CacheBackedRegistry.resourceExists(CacheBackedRegistry.java:291)
>>>>>     at
>>>>> org.wso2.carbon.registry.core.session.UserRegistry.resourceExistsInternal(UserRegistry.java:777)
>>>>>     at
>>>>> org.wso2.carbon.registry.core.session.UserRegistry.access$800(UserRegistry.java:60)
>>>>>     at
>>>>> org.wso2.carbon.registry.core.session.UserRegistry$9.run(UserRegistry.java:760)
>>>>>     at
>>>>> org.wso2.carbon.registry.core.session.UserRegistry$9.run(UserRegistry.java:757)
>>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>>     at
>>>>> org.wso2.carbon.registry.core.session.UserRegistry.resourceExists(UserRegistry.java:757)
>>>>>     at
>>>>> org.wso2.carbon.deployment.synchronizer.internal.repository.CarbonRepositoryUtils.getDeploymentSyncConfigurationFromRegistry(CarbonRepositoryUtils.java:262)
>>>>>     at
>>>>> org.wso2.carbon.deployment.synchronizer.internal.repository.CarbonRepositoryUtils.getActiveSynchronizerConfiguration(CarbonRepositoryUtils.java:108)
>>>>>     at
>>>>> org.wso2.carbon.deployment.synchronizer.internal.DeploymentSynchronizerServiceImpl.commit(DeploymentSynchronizerServiceImpl.java:96)
>>>>>     at
>>>>> org.wso2.carbon.core.deployment.CarbonDeploymentSchedulerTask.deploymentSyncCommit(CarbonDeploymentSchedulerTask.java:207)
>>>>>     at
>>>>> org.wso2.carbon.core.deployment.CarbonDeploymentSchedulerTask.run(CarbonDeploymentSchedulerTask.java:128)
>>>>>     at
>>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>>>>     at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
>>>>>     at
>>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
>>>>>     at
>>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>>>>>     at
>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>     at
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>>     at java.lang.Thread.run(Thread.java:745)
>>>>> Caused by: com.hazelcast.spi.exception.RetryableIOException: Packet
>>>>> not sent to -> Address[192.168.48.5]:4000
>>>>>     at
>>>>> com.hazelcast.spi.impl.BasicInvocation.doInvoke(BasicInvocation.java:360)
>>>>>     at
>>>>> com.hazelcast.spi.impl.BasicInvocation.access$900(BasicInvocation.java:62)
>>>>>     at
>>>>> com.hazelcast.spi.impl.BasicInvocation$ReInvocationTask.run(BasicInvocation.java:549)
>>>>>     at
>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>     at
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>>     at java.lang.Thread.run(Thread.java:745)
>>>>>     at
>>>>> com.hazelcast.util.executor.PoolExecutorThreadFactory$ManagedThread.run(PoolExecutorThreadFactory.java:59)
>>>>>     at ------ End remote and begin local stack-trace ------.(Unknown
>>>>> Source)
>>>>>     at
>>>>> com.hazelcast.spi.impl.BasicInvocation$InvocationFuture.resolveResponse(BasicInvocation.java:862)
>>>>>     at
>>>>> com.hazelcast.spi.impl.BasicInvocation$InvocationFuture.resolveResponseOrThrowException(BasicInvocation.java:795)
>>>>>     at
>>>>> com.hazelcast.spi.impl.BasicInvocation$InvocationFuture.get(BasicInvocation.java:698)
>>>>>     at
>>>>> com.hazelcast.spi.impl.BasicInvocation$InvocationFuture.get(BasicInvocation.java:676)
>>>>>     at
>>>>> com.hazelcast.map.proxy.MapProxySupport.containsKeyInternal(MapProxySupport.java:369)
>>>>>     ... 22 more
>>>>>
>>>>>
>>>>> Regards,
>>>>> NuwanS.
>>>>> --
>>>>>
>>>>>
>>>>> *Nuwan Silva*
>>>>> *Senior Software Engineer - QA*
>>>>> Mobile: +9477 980 4543
>>>>>
>>>>> WSO2 Inc.
>>>>> lean . enterprise . middlewear.
>>>>> http://www.wso2.com
>>>>>
>>>>> _______________________________________________
>>>>> Dev mailing list
>>>>> [email protected]
>>>>> http://wso2.org/cgi-bin/mailman/listinfo/dev
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> *Asanthi Kulasinghe*
>>>> WSO2 Inc; http://www.wso2.com/.
>>>> Mobile: +94777355522
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Dinusha Dilrukshi
>>> Associate Technical Lead
>>> WSO2 Inc.: http://wso2.com/
>>> Mobile: +94725255071
>>> Blog: http://dinushasblog.blogspot.com/
>>>
>>
>>
>>
>> --
>>
>>
>> *Nuwan Silva*
>> *Senior Software Engineer - QA*
>> Mobile: +9477 980 4543
>>
>> WSO2 Inc.
>> lean . enterprise . middlewear.
>> http://www.wso2.com
>>
>
>
>
> --
> *Afkham Azeez*
> Director of Architecture; WSO2, Inc.; http://wso2.com
> Member; Apache Software Foundation; http://www.apache.org/
> * <http://www.apache.org/>*
> *email: **[email protected]* <[email protected]>
> * cell: +94 77 3320919 <%2B94%2077%203320919>blog: *
> *http://blog.afkham.org* <http://blog.afkham.org>
> *twitter: **http://twitter.com/afkham_azeez*
> <http://twitter.com/afkham_azeez>
> *linked-in: **http://lk.linkedin.com/in/afkhamazeez
> <http://lk.linkedin.com/in/afkhamazeez>*
>
> *Lean . Enterprise . Middleware*
>

_______________________________________________
Dev mailing list
[email protected]
http://wso2.org/cgi-bin/mailman/listinfo/dev

Re: [Dev] [APPM] Depsync failure due to hazelcast issue

Reply via email to