Please review and merge the pull request below with the fix, https://github.com/wso2/carbon4-kernel/pull/73
Thanks. /Gayashan On Fri, Nov 7, 2014 at 10:50 AM, Gayashan Amarasinghe <[email protected]> wrote: > Hi Kishanthan, > > On Fri, Nov 7, 2014 at 10:45 AM, Kishanthan Thangarajah < > [email protected]> wrote: > >> >> >> On Thu, Nov 6, 2014 at 9:00 PM, Gayashan Amarasinghe <[email protected]> >> wrote: >> >>> Hi Kishanthan, all, >>> >>> This was a tricky situation and i was able to identify the issue and fix >>> it. This was caused by the new hazelcast upgrade. >>> >>> There are two lists of members maintained in the >>> HazelcastGroupManagementAgent. A hazelcast distributed map shared by the >>> cluster which consists of all the members (members map) in the cluster and >>> a connected members list which is maintained per each subdomain in the >>> cluster. When a member leaves the cluster there's a MemberEntryListener and >>> GroupMembershipListener (and some other listeners) that gets notified. The >>> MemberEntryListener gets notified when the members map gets changed. And >>> when a member leaves, in the entryRemoved method of this listener we remove >>> the particular member that just left from the connectedMembers list as >>> well. And the event that it receives (EntryEvent) consists of the member >>> that left. In the current implementation this member is acquired from this >>> EntryEvent as follows, >>> >>> entryEvent.getValue() >>> >>> So in the code we do this, >>> >>> connectedMembers.remove(entryEvent.getValue()); >>> >>> In the previous hazelcast version this returned the correct member. >>> However with the new hazelcast version this returns a null value which >>> causes the connected members list not getting updated properly. This is >>> casued by a fix in hazlecast [1] [2]. >>> >>> The TenantAwareLoadBalanceEndpoint in the ELB uses this connected >>> members list to get the next application member to serve the incoming >>> request. This was the cause that resulted for the ELB to try sending >>> requests to disconnected members and eventually become non-responsive. >>> >>> As a fix i have identified that we can use the, >>> >>> entryEvent.getOldValue() >>> >>> to acquire the member that just left. (hazelcast issue [1] also suggests >>> to use it) >>> >>> WDYT? >>> >> >> +1, looks like they have fixed the implementation properly and we should >> use the above for member removed event. Good findings :) >> Also I believe this only affects member removed event type and we don't >> have to change any for member added events ? >> > > Thanks and Yes, this only affects the member removed event. No changes > for the member added events. > > > /Gayashan > > >> >>> >>> I have created the JIRA [3] for this issue and will send the PR with the >>> fix. >>> >>> [1] https://github.com/hazelcast/hazelcast/issues/3198 >>> [2] https://github.com/hazelcast/hazelcast/issues/3859 >>> [3] https://wso2.org/jira/browse/CARBON-15057 >>> >>> Thanks. >>> /Gayashan >>> >>> On Wed, Nov 5, 2014 at 5:47 PM, Kishanthan Thangarajah < >>> [email protected]> wrote: >>> >>>> Gayashan, please share your latest findings on this. >>>> >>>> When we see the member left msg, the current member list is updated >>>> with that event (the member gets removed). So above can occur if that is >>>> not happening accordingly. We should also compare the same with and without >>>> hazelcast upgrade. >>>> >>>> On Fri, Oct 31, 2014 at 5:30 PM, Gayashan Amarasinghe < >>>> [email protected]> wrote: >>>> >>>>> Hi all, >>>>> >>>>> For Carbon testing we have a worker-mgt cluster fronted by ELB and >>>>> requests keep coming in from a jmeter client. During this if one (or more) >>>>> of the worker nodes were shutdown, after some time the ELB stops sending >>>>> requests to the nodes and the connection times out. Following log gets >>>>> printed in the ELB. >>>>> >>>>> TID: [0] [ELB] [2014-10-31 06:27:32,517] INFO >>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} - >>>>> Failed to send message to Member Host:172.31.7.214, Remote Host:null, >>>>> Port: >>>>> 4100, HTTP:9765, HTTPS:9445, Domain: wso2.as.domain, Sub-domain:worker, >>>>> Active:true . Error Code: 101503 >>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} >>>>> TID: [0] [ELB] [2014-10-31 06:27:32,519] INFO >>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} - >>>>> Dropping the faulty/unreachable Member with Domain:wso2.as.domain, >>>>> Host:172.31.7.214, Port:4100 >>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} >>>>> TID: [0] [ELB] [2014-10-31 06:27:32,738] INFO >>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} - >>>>> Failed over to Host:172.31.0.128, Remote Host:null, Port: 4100, HTTP:9763, >>>>> HTTPS:9443, Domain: wso2.as.domain, Sub-domain:worker, Active:true >>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} >>>>> TID: [0] [ELB] [2014-10-31 06:27:32,740] WARN >>>>> {org.apache.synapse.transport.passthru.ConnectCallback} - Connection >>>>> refused or failed for : /172.31.7.214:9765 >>>>> {org.apache.synapse.transport.passthru.ConnectCallback} >>>>> TID: [0] [ELB] [2014-10-31 06:27:32,743] INFO >>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} - >>>>> Failed to send message to Member Host:172.31.7.214, Remote Host:null, >>>>> Port: >>>>> 4100, HTTP:9765, HTTPS:9445, Domain: wso2.as.domain, Sub-domain:worker, >>>>> Active:true . Error Code: 101503 >>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} >>>>> TID: [0] [ELB] [2014-10-31 06:27:32,745] INFO >>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} - >>>>> Dropping the faulty/unreachable Member with Domain:wso2.as.domain, >>>>> Host:172.31.7.214, Port:4100 >>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} >>>>> TID: [0] [ELB] [2014-10-31 06:27:33,518] INFO >>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} - >>>>> Failed over to Host:172.31.7.214, Remote Host:null, Port: 4100, HTTP:9765, >>>>> HTTPS:9445, Domain: wso2.as.domain, Sub-domain:worker, Active:true >>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} >>>>> TID: [0] [ELB] [2014-10-31 06:27:33,520] WARN >>>>> {org.apache.synapse.transport.passthru.ConnectCallback} - Connection >>>>> refused or failed for : /172.31.7.214:9765 >>>>> {org.apache.synapse.transport.passthru.ConnectCallback} >>>>> TID: [0] [ELB] [2014-10-31 06:27:33,523] WARN >>>>> {org.apache.synapse.transport.passthru.ConnectCallback} - Connection >>>>> refused or failed for : /172.31.7.214:9765 >>>>> {org.apache.synapse.transport.passthru.ConnectCallback} >>>>> TID: [0] [ELB] [2014-10-31 06:27:33,744] INFO >>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} - >>>>> Failed over to Host:172.31.7.214, Remote Host:null, Port: 4100, HTTP:9765, >>>>> HTTPS:9445, Domain: wso2.as.domain, Sub-domain:worker, Active:true >>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} >>>>> TID: [0] [ELB] [2014-10-31 06:27:33,745] INFO >>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} - >>>>> Failed to send message to Member Host:172.31.7.214, Remote Host:null, >>>>> Port: >>>>> 4100, HTTP:9765, HTTPS:9445, Domain: wso2.as.domain, Sub-domain:worker, >>>>> Active:true . Error Code: 101503 >>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} >>>>> TID: [0] [ELB] [2014-10-31 06:27:33,745] WARN >>>>> {org.apache.synapse.transport.passthru.ConnectCallback} - Connection >>>>> refused or failed for : /172.31.7.214:9765 >>>>> {org.apache.synapse.transport.passthru.ConnectCallback} >>>>> TID: [0] [ELB] [2014-10-31 06:27:33,747] INFO >>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} - >>>>> Dropping the faulty/unreachable Member with Domain:wso2.as.domain, >>>>> Host:172.31.7.214, Port:4100 >>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} >>>>> TID: [0] [ELB] [2014-10-31 06:27:34,746] INFO >>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} - >>>>> Failed over to Host:172.31.0.128, Remote Host:null, Port: 4100, HTTP:9763, >>>>> HTTPS:9443, Domain: wso2.as.domain, Sub-domain:worker, Active:true >>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} >>>>> TID: [0] [ELB] [2014-10-31 06:27:34,748] INFO >>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} - >>>>> Failed to send message to Member Host:172.31.7.214, Remote Host:null, >>>>> Port: >>>>> 4100, HTTP:9765, HTTPS:9445, Domain: wso2.as.domain, Sub-domain:worker, >>>>> Active:true . Error Code: 101503 >>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} >>>>> TID: [0] [ELB] [2014-10-31 06:27:34,750] INFO >>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} - >>>>> Dropping the faulty/unreachable Member with Domain:wso2.as.domain, >>>>> Host:172.31.7.214, Port:4100 >>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} >>>>> TID: [0] [ELB] [2014-10-31 06:27:35,749] INFO >>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} - >>>>> Failed over to Host:172.31.7.214, Remote Host:null, Port: 4100, HTTP:9765, >>>>> HTTPS:9445, Domain: wso2.as.domain, Sub-domain:worker, Active:true >>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} >>>>> TID: [0] [ELB] [2014-10-31 06:27:35,750] WARN >>>>> {org.apache.synapse.transport.passthru.ConnectCallback} - Connection >>>>> refused or failed for : /172.31.7.214:9765 >>>>> {org.apache.synapse.transport.passthru.ConnectCallback} >>>>> TID: [0] [ELB] [2014-10-31 06:28:30,604] WARN >>>>> {org.apache.synapse.transport.passthru.SourceHandler} - Connection time >>>>> out after request is read: http-incoming-61 >>>>> {org.apache.synapse.transport.passthru.SourceHandler} >>>>> TID: [0] [ELB] [2014-10-31 06:28:32,606] WARN >>>>> {org.apache.synapse.transport.passthru.SourceHandler} - Connection time >>>>> out after request is read: http-incoming-65 >>>>> {org.apache.synapse.transport.passthru.SourceHandler} >>>>> TID: [0] [ELB] [2014-10-31 06:28:33,608] WARN >>>>> {org.apache.synapse.transport.passthru.SourceHandler} - Connection time >>>>> out after request is read: http-incoming-73 >>>>> {org.apache.synapse.transport.passthru.SourceHandler} >>>>> TID: [0] [ELB] [2014-10-31 06:28:33,608] WARN >>>>> {org.apache.synapse.transport.passthru.SourceHandler} - Connection time >>>>> out after request is read: http-incoming-69 >>>>> {org.apache.synapse.transport.passthru.SourceHandler} >>>>> TID: [0] [ELB] [2014-10-31 06:28:33,608] WARN >>>>> {org.apache.synapse.transport.passthru.SourceHandler} - Connection time >>>>> out after request is read: http-incoming-64 >>>>> {org.apache.synapse.transport.passthru.SourceHandler} >>>>> TID: [0] [ELB] [2014-10-31 06:28:33,609] WARN >>>>> {org.apache.synapse.transport.passthru.SourceHandler} - Connection time >>>>> out after request is read: http-incoming-75 >>>>> {org.apache.synapse.transport.passthru.SourceHandler} >>>>> TID: [0] [ELB] [2014-10-31 06:28:34,610] WARN >>>>> {org.apache.synapse.transport.passthru.SourceHandler} - Connection time >>>>> out after request is read: http-incoming-60 >>>>> {org.apache.synapse.transport.passthru.SourceHandler} >>>>> TID: [0] [ELB] [2014-10-31 06:28:34,611] WARN >>>>> {org.apache.synapse.transport.passthru.SourceHandler} - Connection time >>>>> out after request is read: http-incoming-62 >>>>> {org.apache.synapse.transport.passthru.SourceHandler} >>>>> TID: [0] [ELB] [2014-10-31 06:28:34,611] WARN >>>>> {org.apache.synapse.transport.passthru.SourceHandler} - Connection time >>>>> out after request is read: http-incoming-67 >>>>> {org.apache.synapse.transport.passthru.SourceHandler} >>>>> >>>>> Need to restart the ELB to recover from this. Any idea what's going >>>>> on? Is this a known issue? Can provide the full log if needed. >>>>> >>>>> Thanks. >>>>> /Gayashan >>>>> >>>>> -- >>>>> *Gayashan Amarasinghe* >>>>> Software Engineer | Platform TG >>>>> WSO2, Inc. | http://wso2.com >>>>> lean. enterprise. middleware >>>>> >>>>> Mobile : +94718314517 >>>>> Blog : gayashan-a.blogspot.com >>>>> >>>> >>>> >>>> >>>> -- >>>> *Kishanthan Thangarajah* >>>> Senior Software Engineer, >>>> Platform Technologies Team, >>>> WSO2, Inc. >>>> lean.enterprise.middleware >>>> >>>> Mobile - +94773426635 >>>> Blog - *http://kishanthan.wordpress.com >>>> <http://kishanthan.wordpress.com>* >>>> Twitter - *http://twitter.com/kishanthan >>>> <http://twitter.com/kishanthan>* >>>> >>> >>> >>> >>> -- >>> *Gayashan Amarasinghe* >>> Software Engineer | Platform TG >>> WSO2, Inc. | http://wso2.com >>> lean. enterprise. middleware >>> >>> Mobile : +94718314517 >>> Blog : gayashan-a.blogspot.com >>> >> >> >> >> -- >> *Kishanthan Thangarajah* >> Senior Software Engineer, >> Platform Technologies Team, >> WSO2, Inc. >> lean.enterprise.middleware >> >> Mobile - +94773426635 >> Blog - *http://kishanthan.wordpress.com >> <http://kishanthan.wordpress.com>* >> Twitter - *http://twitter.com/kishanthan <http://twitter.com/kishanthan>* >> > > > > -- > *Gayashan Amarasinghe* > Software Engineer | Platform TG > WSO2, Inc. | http://wso2.com > lean. enterprise. middleware > > Mobile : +94718314517 > Blog : gayashan-a.blogspot.com > -- *Gayashan Amarasinghe* Software Engineer | Platform TG WSO2, Inc. | http://wso2.com lean. enterprise. middleware Mobile : +94718314517 Blog : gayashan-a.blogspot.com
_______________________________________________ Dev mailing list [email protected] http://wso2.org/cgi-bin/mailman/listinfo/dev
