Saying just a node went OOM is has no value. Also seeing a Hazelcast error in the stacktrace doesn't necessarily mean Hazelcast caused your node to go OOM. You have to profile and see why it is going OOM.
On Thu, Nov 19, 2015 at 10:18 AM, Milinda Perera <[email protected]> wrote: > Hi, > > In BRS 220 snapshot (with kernel upgraded to 442), in clustered setup (in > our test 3 nodes). We did load test targeting one node (lets say node3) to > a Stateful rule service until it goes OOM, and following are the errors > shown in two nodes: > > > * CacheCleanup error is shown from one of the nodes which working fine (in > our case node2):* > [2015-11-17 17:16:58,951] WARN {org.wso2.carbon.caching.impl.CacheImpl} > - Exception occurred while expiring item from distributed cache. No > response for 120000 ms. Aborting invocation! Invocation{ > serviceName='hz:impl:mapService', > op=RemoveOperation{$cache.$domain[carbon.super]Claim.Cache.Manager#Claim.Cache}, > partitionId=64, replicaIndex=0, tryCount=250, tryPauseMillis=500, > invokeCount=1, callTimeout=60000, target=Address[10.100.5.92]:4002, > backupsExpected=0, backupsCompleted=0} No response has been received! > backups-expected:0 backups-completed: 0 > [2015-11-17 17:21:08,971] ERROR > {org.wso2.carbon.caching.impl.CacheCleanupTask} - Error occurred while > running CacheCleanupTask > com.hazelcast.core.OperationTimeoutException: No response for 120000 ms. > Aborting invocation! Invocation{ serviceName='hz:impl:mapService', > op=ClearOperation{}, partitionId=46, replicaIndex=0, tryCount=250, > tryPauseMillis=500, invokeCount=1, callTimeout=60000, > target=Address[10.100.5.92]:4002, backupsExpected=0, backupsCompleted=0} No > response has been received! backups-expected:0 backups-completed: 0 > at > com.hazelcast.spi.impl.operationservice.impl.Invocation.newOperationTimeoutException(Invocation.java:491) > at > com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.waitForResponse(InvocationFuture.java:277) > at > com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.get(InvocationFuture.java:224) > at > com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.get(InvocationFuture.java:204) > at > com.hazelcast.spi.impl.operationservice.impl.InvokeOnPartitions.retryFailedPartitions(InvokeOnPartitions.java:131) > at > com.hazelcast.spi.impl.operationservice.impl.InvokeOnPartitions.invoke(InvokeOnPartitions.java:67) > at > com.hazelcast.spi.impl.operationservice.impl.OperationServiceImpl.invokeOnAllPartitions(OperationServiceImpl.java:326) > at > com.hazelcast.map.impl.proxy.MapProxySupport.clearInternal(MapProxySupport.java:914) > at > com.hazelcast.map.impl.proxy.MapProxyImpl.clearInternal(MapProxyImpl.java:71) > at > com.hazelcast.map.impl.proxy.MapProxyImpl.clear(MapProxyImpl.java:532) > at > org.wso2.carbon.core.clustering.hazelcast.HazelcastDistributedMapProvider$DistMap.clear(HazelcastDistributedMapProvider.java:172) > at org.wso2.carbon.caching.impl.CacheImpl.stop(CacheImpl.java:734) > at > org.wso2.carbon.caching.impl.CarbonCacheManager.removeCache(CarbonCacheManager.java:168) > at org.wso2.carbon.caching.impl.CacheImpl.expire(CacheImpl.java:769) > at > org.wso2.carbon.caching.impl.CacheImpl.runCacheExpiry(CacheImpl.java:931) > at > org.wso2.carbon.caching.impl.CacheCleanupTask.run(CacheCleanupTask.java:61) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > at ------ End remote and begin local stack-trace ------.(Unknown > Source) > at > com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.resolveApplicationResponse(InvocationFuture.java:384) > at > com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.resolveApplicationResponseOrThrowException(InvocationFuture.java:334) > at > com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.get(InvocationFuture.java:225) > at > com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.get(InvocationFuture.java:204) > at > com.hazelcast.spi.impl.operationservice.impl.InvokeOnPartitions.retryFailedPartitions(InvokeOnPartitions.java:131) > at > com.hazelcast.spi.impl.operationservice.impl.InvokeOnPartitions.invoke(InvokeOnPartitions.java:67) > at > com.hazelcast.spi.impl.operationservice.impl.OperationServiceImpl.invokeOnAllPartitions(OperationServiceImpl.java:326) > at > com.hazelcast.map.impl.proxy.MapProxySupport.clearInternal(MapProxySupport.java:914) > at > com.hazelcast.map.impl.proxy.MapProxyImpl.clearInternal(MapProxyImpl.java:71) > at > com.hazelcast.map.impl.proxy.MapProxyImpl.clear(MapProxyImpl.java:532) > at > org.wso2.carbon.core.clustering.hazelcast.HazelcastDistributedMapProvider$DistMap.clear(HazelcastDistributedMapProvider.java:172) > at org.wso2.carbon.caching.impl.CacheImpl.stop(CacheImpl.java:734) > at > org.wso2.carbon.caching.impl.CarbonCacheManager.removeCache(CarbonCacheManager.java:168) > at org.wso2.carbon.caching.impl.CacheImpl.expire(CacheImpl.java:769) > at > org.wso2.carbon.caching.impl.CacheImpl.runCacheExpiry(CacheImpl.java:931) > at > org.wso2.carbon.caching.impl.CacheCleanupTask.run(CacheCleanupTask.java:61) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > [2015-11-17 17:23:49,753] WARN {org.wso2.carbon.caching.impl.CacheImpl} > - Exception occurred while expiring item from distributed cache. No > response for 120000 ms. Aborting invocation! Invocation{ > serviceName='hz:impl:mapService', > op=RemoveOperation{$cache.$domain[carbon.super]registryCacheManager#REG_PATH_CACHE}, > partitionId=46, replicaIndex=0, tryCount=250, tryPauseMillis=500, > invokeCount=1, callTimeout=60000, target=Address[10.100.5.92]:4002, > backupsExpected=0, backupsCompleted=0} No response has been received! > backups-expected:0 backups-completed: 0 > [2015-11-17 17:26:02,701] WARN {org.wso2.carbon.caching.impl.CacheImpl} > - Exception occurred while expiring item from distributed cache. > com.hazelcast.spi.exception.RetryableIOException: Packet not send to -> > Address[10.100.5.92]:4002 > [2015-11-17 17:28:35,259] WARN {org.wso2.carbon.caching.impl.CacheImpl} > - Exception occurred while expiring item from distributed cache. > com.hazelcast.spi.exception.RetryableIOException: Packet not send to -> > Address[10.100.5.92]:4002 > [2015-11-17 17:30:37,821] WARN {org.wso2.carbon.caching.impl.CacheImpl} > - Exception occurred while expiring item from distributed cache. > com.hazelcast.spi.exception.RetryableIOException: Packet not send to -> > Address[10.100.5.92]:4002 > > > *And following error messages in the node which goes OOM (node3):* > > java.lang.OutOfMemoryError: Java heap space[2015-11-17 17:43:07,427] ERROR > {org.apache.tomcat.util.net.NioEndpoint$SocketProcessor} - > java.lang.OutOfMemoryError: Java heap space > > java.lang.OutOfMemoryError: Java heap space[2015-11-17 17:43:19,246] ERROR > {com.hazelcast.spi.impl.operationexecutor.classic.ClassicOperationExecutor} > - [10.100.5.92]:4002 [wso2.carbon.domain] [3.5.2] Failed to process > packet: Packet{header=1, isResponse=false, isOperation=true, isEvent=false, > partitionId=90, conn=Connection [0.0.0.0/0.0.0.0:4002 -> null], > endpoint=Address[10.100.5.92]:4001, live=false, type=MEMBER} on > hz.wso2.carbon.domain.instance.partition-operation.thread-2 > java.lang.OutOfMemoryError: Java heap space > > java.lang.OutOfMemoryError: Java heap space[2015-11-17 17:43:22,764] ERROR > {com.hazelcast.spi.impl.operationexecutor.classic.ClassicOperationExecutor} > - [10.100.5.92]:4002 [wso2.carbon.domain] [3.5.2] Failed to process > packet: Packet{header=1, isResponse=false, isOperation=true, isEvent=false, > partitionId=110, conn=Connection [0.0.0.0/0.0.0.0:4002 -> null], > endpoint=Address[10.100.5.92]:4001, live=false, type=MEMBER} on > hz.wso2.carbon.domain.instance.partition-operation.thread-6 > java.lang.OutOfMemoryError: Java heap space > > FYI: Other two nodes are working fine and serve requests fine. > > What could be the reason? > > Thanks, > Milinda > > -- > Milinda Perera > Software Engineer; > WSO2 Inc. http://wso2.com , > Mobile: (+94) 714 115 032 > > -- *Afkham Azeez* Director of Architecture; WSO2, Inc.; http://wso2.com Member; Apache Software Foundation; http://www.apache.org/ * <http://www.apache.org/>* *email: **[email protected]* <[email protected]> * cell: +94 77 3320919blog: **http://blog.afkham.org* <http://blog.afkham.org> *twitter: **http://twitter.com/afkham_azeez* <http://twitter.com/afkham_azeez> *linked-in: **http://lk.linkedin.com/in/afkhamazeez <http://lk.linkedin.com/in/afkhamazeez>* *Lean . Enterprise . Middleware*
_______________________________________________ Dev mailing list [email protected] http://wso2.org/cgi-bin/mailman/listinfo/dev
