RE: Stratos 4.1: "Too many open files" issue

Martin Eppel (meppel) Mon, 14 Sep 2015 11:21:13 -0700

Hi Reka, Akila

Thanks for the response:


On the system where we encountered the issue we are still running the java 
agent (JCA). I checked the commit as suggested by Akila and we do have it.

The secureEvictionTimePeriod is set to the default (5500 ms).

Question, in STRATOS-739 it states (see quote below) that the 
secureEvictionTimePeriod needs to be tuned according the publishing period – 
how would I determine the publishing period ? Also, I don’t see the config file 
(thrift-agent-config.xml) in the JCA package (client side) ? Where should we 
set the time period, server side or client side ?

Is this configuration (or something similar ) expected as well in PCA (we are 
migrating over but it’s taking some time) ?

Thanks

Martin

Quote from doc:

(… If you are publishing events in a periodic interval as more than 5.5s, you 
need to tune the <secureEvictionTimePeriod> parameter accordingly…)

From: Reka Thirunavukkarasu [mailto:r...@wso2.com]
Sent: Sunday, September 13, 2015 10:49 PM
To: dev
Cc: Chamila De Alwis
Subject: Re: Stratos 4.1: "Too many open files" issue

Hi Martin,
What cartridge agent are you using currently? Is it java or Python? This 
problem was identified from the thrift data publisher side in java. Since 
python agent is using different approach to connect to data receiver, we will 
need to verify whether python agent fixed this particular issue. If you could 
explain who are connecting to stratos using thrift, then we can check on the 
thrift agent version as Akila mentioned and find out the root cause of this 
issue.
Thanks,
Reka

On Sun, Sep 13, 2015 at 12:46 PM, Akila Ravihansa Perera 
<raviha...@wso2.com<mailto:raviha...@wso2.com>> wrote:
Hi Martin,

I think we fixed this problem by uplifting Thrift agent feature in [1]. The 
root cause of this issue was that after periodic publishing, thrift agent fails 
to evict the old connection according to [2]. The fix is described in [3]. 
Looks like your stack trace is very similar to what has been reported in JIRA.

Can you check whether you have this fix applied?

[1] 
https://github.com/apache/stratos/commit/8985d96eb811aa8e9ce2c114f1856b4c4e20517b
[2] https://issues.apache.org/jira/browse/STRATOS-723
[3] https://issues.apache.org/jira/browse/STRATOS-739

Thanks.

On Sun, Sep 13, 2015 at 10:56 AM, Imesh Gunaratne 
<im...@apache.org<mailto:im...@apache.org>> wrote:
Hi Martin,

I believe you are using 4.1.0-RC4 with some custom changes you have done 
locally. Will you be able to test this on stratos-4.1.x branch latest commit 
(without having any other changes)? I cannot recall a fix we did after 
4.1.0-RC4 for this but it would be better if you can verify with the latest 
code in stratos-4.1.x branch.

At the same time will you be able to do following:

  *   Take a thread dump of the running Stratos, CEP instances once this happens
  *   Check the file descriptor limits of the OS
Thanks

On Sat, Sep 12, 2015 at 10:56 PM, Martin Eppel (meppel) 
<mep...@cisco.com<mailto:mep...@cisco.com>> wrote:
Resending in case it got lost,

Thanks

Martin

From: Martin Eppel (meppel)
Sent: Thursday, September 10, 2015 2:39 PM
To: dev@stratos.apache.org<mailto:dev@stratos.apache.org>
Subject: Stratos 4.1: "Too many open files" issue

Hi,

We are seeing an issue with stratos running out of file handles when creating a 
number of applications and VM instances:

The scenario is as follows:

13 applications are deployed, each with a single cluster and a single member 
instance,

As the VMs spin up stratos becomes unresponsive and checking the logs we find 
the following exceptions (see below). I remember we had seen similar issues 
(same exceptions) back in stratos 4.0 in the context of longevity tests.

We are running stratos 4.1 RC4 with the latest  commit

commit 0fd41840fb04d92ba921bf58c59c2c3fbad0c561
Author: Imesh Gunaratne <im...@apache.org<mailto:im...@apache.org>>
Date:   Tue Jul 7 12:54:47 2015 +0530

Is this a known issue which might have been fixed in a later commit or 
something new ? Can we verify that the fixes for the previous issues are 
included in our system (jars, commit,s etc …) ?




rg.apache.thrift.transport.TTransportException: java.net.SocketException: Too 
many open files
at org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:118)
at org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:35)
at org.apache.thrift.transport.TServerTransport.accept(TServerTransport.java:31)
at org.apache.thrift.server.TThreadPoolServer.serve(TThreadPoolServer.java:106)
at 
org.wso2.carbon.databridge.receiver.thrift.internal.ThriftDataReceiver$ServerThread.run(ThriftDataReceiver.java:199)
at java.lang.Thread.run(Thread.java:745)
TID: [0] [STRATOS] [2015-08-17 17:38:17,499] WARN 
{org.apache.thrift.server.TThreadPoolServer} - Transport error occurred during 
acceptance of message.
org.apache.thrift.transport.TTransportException: java.net.SocketException: Too 
many open files
at org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:118)
at org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:35)
at org.apache.thrift.transport.TServerTransport.accept(TServerTransport.java:31)
at org.apache.thrift.server.TThreadPoolServer.serve(TThreadPoolServer.java:106)
at 
org.wso2.carbon.databridge.receiver.thrift.internal.ThriftDataReceiver$ServerThread.run(ThriftDataReceiver.java:199)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.SocketException: Too many open files

// listing the applications, member isntances and cartridge state:

[di-000-xxx] – application name,

di-000-010: applicationInstances 1, groupInstances 0, clusterInstances 1, 
members 1 (Starting 1)
di-000-011: applicationInstances 1, groupInstances 0, clusterInstances 1, 
members 1 (Initialized 1)
cartridge-proxy: applicationInstances 1, groupInstances 0, clusterInstances 1, 
members 1 (Active 1)
di-000-001: applicationInstances 1, groupInstances 0, clusterInstances 1, 
members 1 (Active 1)
di-000-002: applicationInstances 1, groupInstances 0, clusterInstances 1, 
members 1 (Active 1)
di-000-012: applicationInstances 1, groupInstances 0, clusterInstances 1, 
members 1 (Created 1)
di-000-003: applicationInstances 1, groupInstances 0, clusterInstances 1, 
members 1 (Starting 1)
di-000-004: applicationInstances 1, groupInstances 0, clusterInstances 1, 
members 1 (Starting 1)
di-000-006: applicationInstances 1, groupInstances 0, clusterInstances 1, 
members 1 (Starting 1)
di-000-005: applicationInstances 1, groupInstances 0, clusterInstances 1, 
members 1 (Starting 1)
di-000-008: applicationInstances 1, groupInstances 0, clusterInstances 1, 
members 1 (Starting 1)
di-000-007: applicationInstances 1, groupInstances 0, clusterInstances 1, 
members 1 (Starting 1)
di-000-009: applicationInstances 1, groupInstances 0, clusterInstances 1, 
members 1 (Starting 1)




--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos



--
Akila Ravihansa Perera
WSO2 Inc.;  http://wso2.com/

Blog: http://ravihansa3000.blogspot.com



--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007

RE: Stratos 4.1: "Too many open files" issue

Reply via email to