Hi Kishanthan, On Fri, Nov 7, 2014 at 10:45 AM, Kishanthan Thangarajah <[email protected] > wrote:
> > > On Thu, Nov 6, 2014 at 9:00 PM, Gayashan Amarasinghe <[email protected]> > wrote: > >> Hi Kishanthan, all, >> >> This was a tricky situation and i was able to identify the issue and fix >> it. This was caused by the new hazelcast upgrade. >> >> There are two lists of members maintained in the >> HazelcastGroupManagementAgent. A hazelcast distributed map shared by the >> cluster which consists of all the members (members map) in the cluster and >> a connected members list which is maintained per each subdomain in the >> cluster. When a member leaves the cluster there's a MemberEntryListener and >> GroupMembershipListener (and some other listeners) that gets notified. The >> MemberEntryListener gets notified when the members map gets changed. And >> when a member leaves, in the entryRemoved method of this listener we remove >> the particular member that just left from the connectedMembers list as >> well. And the event that it receives (EntryEvent) consists of the member >> that left. In the current implementation this member is acquired from this >> EntryEvent as follows, >> >> entryEvent.getValue() >> >> So in the code we do this, >> >> connectedMembers.remove(entryEvent.getValue()); >> >> In the previous hazelcast version this returned the correct member. >> However with the new hazelcast version this returns a null value which >> causes the connected members list not getting updated properly. This is >> casued by a fix in hazlecast [1] [2]. >> >> The TenantAwareLoadBalanceEndpoint in the ELB uses this connected members >> list to get the next application member to serve the incoming request. This >> was the cause that resulted for the ELB to try sending requests to >> disconnected members and eventually become non-responsive. >> >> As a fix i have identified that we can use the, >> >> entryEvent.getOldValue() >> >> to acquire the member that just left. (hazelcast issue [1] also suggests >> to use it) >> >> WDYT? >> > > +1, looks like they have fixed the implementation properly and we should > use the above for member removed event. Good findings :) > Also I believe this only affects member removed event type and we don't > have to change any for member added events ? > Thanks and Yes, this only affects the member removed event. No changes for the member added events. /Gayashan > >> >> I have created the JIRA [3] for this issue and will send the PR with the >> fix. >> >> [1] https://github.com/hazelcast/hazelcast/issues/3198 >> [2] https://github.com/hazelcast/hazelcast/issues/3859 >> [3] https://wso2.org/jira/browse/CARBON-15057 >> >> Thanks. >> /Gayashan >> >> On Wed, Nov 5, 2014 at 5:47 PM, Kishanthan Thangarajah < >> [email protected]> wrote: >> >>> Gayashan, please share your latest findings on this. >>> >>> When we see the member left msg, the current member list is updated with >>> that event (the member gets removed). So above can occur if that is not >>> happening accordingly. We should also compare the same with and without >>> hazelcast upgrade. >>> >>> On Fri, Oct 31, 2014 at 5:30 PM, Gayashan Amarasinghe <[email protected] >>> > wrote: >>> >>>> Hi all, >>>> >>>> For Carbon testing we have a worker-mgt cluster fronted by ELB and >>>> requests keep coming in from a jmeter client. During this if one (or more) >>>> of the worker nodes were shutdown, after some time the ELB stops sending >>>> requests to the nodes and the connection times out. Following log gets >>>> printed in the ELB. >>>> >>>> TID: [0] [ELB] [2014-10-31 06:27:32,517] INFO >>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} - >>>> Failed to send message to Member Host:172.31.7.214, Remote Host:null, Port: >>>> 4100, HTTP:9765, HTTPS:9445, Domain: wso2.as.domain, Sub-domain:worker, >>>> Active:true . Error Code: 101503 >>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} >>>> TID: [0] [ELB] [2014-10-31 06:27:32,519] INFO >>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} - >>>> Dropping the faulty/unreachable Member with Domain:wso2.as.domain, >>>> Host:172.31.7.214, Port:4100 >>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} >>>> TID: [0] [ELB] [2014-10-31 06:27:32,738] INFO >>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} - >>>> Failed over to Host:172.31.0.128, Remote Host:null, Port: 4100, HTTP:9763, >>>> HTTPS:9443, Domain: wso2.as.domain, Sub-domain:worker, Active:true >>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} >>>> TID: [0] [ELB] [2014-10-31 06:27:32,740] WARN >>>> {org.apache.synapse.transport.passthru.ConnectCallback} - Connection >>>> refused or failed for : /172.31.7.214:9765 >>>> {org.apache.synapse.transport.passthru.ConnectCallback} >>>> TID: [0] [ELB] [2014-10-31 06:27:32,743] INFO >>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} - >>>> Failed to send message to Member Host:172.31.7.214, Remote Host:null, Port: >>>> 4100, HTTP:9765, HTTPS:9445, Domain: wso2.as.domain, Sub-domain:worker, >>>> Active:true . Error Code: 101503 >>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} >>>> TID: [0] [ELB] [2014-10-31 06:27:32,745] INFO >>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} - >>>> Dropping the faulty/unreachable Member with Domain:wso2.as.domain, >>>> Host:172.31.7.214, Port:4100 >>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} >>>> TID: [0] [ELB] [2014-10-31 06:27:33,518] INFO >>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} - >>>> Failed over to Host:172.31.7.214, Remote Host:null, Port: 4100, HTTP:9765, >>>> HTTPS:9445, Domain: wso2.as.domain, Sub-domain:worker, Active:true >>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} >>>> TID: [0] [ELB] [2014-10-31 06:27:33,520] WARN >>>> {org.apache.synapse.transport.passthru.ConnectCallback} - Connection >>>> refused or failed for : /172.31.7.214:9765 >>>> {org.apache.synapse.transport.passthru.ConnectCallback} >>>> TID: [0] [ELB] [2014-10-31 06:27:33,523] WARN >>>> {org.apache.synapse.transport.passthru.ConnectCallback} - Connection >>>> refused or failed for : /172.31.7.214:9765 >>>> {org.apache.synapse.transport.passthru.ConnectCallback} >>>> TID: [0] [ELB] [2014-10-31 06:27:33,744] INFO >>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} - >>>> Failed over to Host:172.31.7.214, Remote Host:null, Port: 4100, HTTP:9765, >>>> HTTPS:9445, Domain: wso2.as.domain, Sub-domain:worker, Active:true >>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} >>>> TID: [0] [ELB] [2014-10-31 06:27:33,745] INFO >>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} - >>>> Failed to send message to Member Host:172.31.7.214, Remote Host:null, Port: >>>> 4100, HTTP:9765, HTTPS:9445, Domain: wso2.as.domain, Sub-domain:worker, >>>> Active:true . Error Code: 101503 >>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} >>>> TID: [0] [ELB] [2014-10-31 06:27:33,745] WARN >>>> {org.apache.synapse.transport.passthru.ConnectCallback} - Connection >>>> refused or failed for : /172.31.7.214:9765 >>>> {org.apache.synapse.transport.passthru.ConnectCallback} >>>> TID: [0] [ELB] [2014-10-31 06:27:33,747] INFO >>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} - >>>> Dropping the faulty/unreachable Member with Domain:wso2.as.domain, >>>> Host:172.31.7.214, Port:4100 >>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} >>>> TID: [0] [ELB] [2014-10-31 06:27:34,746] INFO >>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} - >>>> Failed over to Host:172.31.0.128, Remote Host:null, Port: 4100, HTTP:9763, >>>> HTTPS:9443, Domain: wso2.as.domain, Sub-domain:worker, Active:true >>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} >>>> TID: [0] [ELB] [2014-10-31 06:27:34,748] INFO >>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} - >>>> Failed to send message to Member Host:172.31.7.214, Remote Host:null, Port: >>>> 4100, HTTP:9765, HTTPS:9445, Domain: wso2.as.domain, Sub-domain:worker, >>>> Active:true . Error Code: 101503 >>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} >>>> TID: [0] [ELB] [2014-10-31 06:27:34,750] INFO >>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} - >>>> Dropping the faulty/unreachable Member with Domain:wso2.as.domain, >>>> Host:172.31.7.214, Port:4100 >>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} >>>> TID: [0] [ELB] [2014-10-31 06:27:35,749] INFO >>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} - >>>> Failed over to Host:172.31.7.214, Remote Host:null, Port: 4100, HTTP:9765, >>>> HTTPS:9445, Domain: wso2.as.domain, Sub-domain:worker, Active:true >>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} >>>> TID: [0] [ELB] [2014-10-31 06:27:35,750] WARN >>>> {org.apache.synapse.transport.passthru.ConnectCallback} - Connection >>>> refused or failed for : /172.31.7.214:9765 >>>> {org.apache.synapse.transport.passthru.ConnectCallback} >>>> TID: [0] [ELB] [2014-10-31 06:28:30,604] WARN >>>> {org.apache.synapse.transport.passthru.SourceHandler} - Connection time >>>> out after request is read: http-incoming-61 >>>> {org.apache.synapse.transport.passthru.SourceHandler} >>>> TID: [0] [ELB] [2014-10-31 06:28:32,606] WARN >>>> {org.apache.synapse.transport.passthru.SourceHandler} - Connection time >>>> out after request is read: http-incoming-65 >>>> {org.apache.synapse.transport.passthru.SourceHandler} >>>> TID: [0] [ELB] [2014-10-31 06:28:33,608] WARN >>>> {org.apache.synapse.transport.passthru.SourceHandler} - Connection time >>>> out after request is read: http-incoming-73 >>>> {org.apache.synapse.transport.passthru.SourceHandler} >>>> TID: [0] [ELB] [2014-10-31 06:28:33,608] WARN >>>> {org.apache.synapse.transport.passthru.SourceHandler} - Connection time >>>> out after request is read: http-incoming-69 >>>> {org.apache.synapse.transport.passthru.SourceHandler} >>>> TID: [0] [ELB] [2014-10-31 06:28:33,608] WARN >>>> {org.apache.synapse.transport.passthru.SourceHandler} - Connection time >>>> out after request is read: http-incoming-64 >>>> {org.apache.synapse.transport.passthru.SourceHandler} >>>> TID: [0] [ELB] [2014-10-31 06:28:33,609] WARN >>>> {org.apache.synapse.transport.passthru.SourceHandler} - Connection time >>>> out after request is read: http-incoming-75 >>>> {org.apache.synapse.transport.passthru.SourceHandler} >>>> TID: [0] [ELB] [2014-10-31 06:28:34,610] WARN >>>> {org.apache.synapse.transport.passthru.SourceHandler} - Connection time >>>> out after request is read: http-incoming-60 >>>> {org.apache.synapse.transport.passthru.SourceHandler} >>>> TID: [0] [ELB] [2014-10-31 06:28:34,611] WARN >>>> {org.apache.synapse.transport.passthru.SourceHandler} - Connection time >>>> out after request is read: http-incoming-62 >>>> {org.apache.synapse.transport.passthru.SourceHandler} >>>> TID: [0] [ELB] [2014-10-31 06:28:34,611] WARN >>>> {org.apache.synapse.transport.passthru.SourceHandler} - Connection time >>>> out after request is read: http-incoming-67 >>>> {org.apache.synapse.transport.passthru.SourceHandler} >>>> >>>> Need to restart the ELB to recover from this. Any idea what's going >>>> on? Is this a known issue? Can provide the full log if needed. >>>> >>>> Thanks. >>>> /Gayashan >>>> >>>> -- >>>> *Gayashan Amarasinghe* >>>> Software Engineer | Platform TG >>>> WSO2, Inc. | http://wso2.com >>>> lean. enterprise. middleware >>>> >>>> Mobile : +94718314517 >>>> Blog : gayashan-a.blogspot.com >>>> >>> >>> >>> >>> -- >>> *Kishanthan Thangarajah* >>> Senior Software Engineer, >>> Platform Technologies Team, >>> WSO2, Inc. >>> lean.enterprise.middleware >>> >>> Mobile - +94773426635 >>> Blog - *http://kishanthan.wordpress.com >>> <http://kishanthan.wordpress.com>* >>> Twitter - *http://twitter.com/kishanthan >>> <http://twitter.com/kishanthan>* >>> >> >> >> >> -- >> *Gayashan Amarasinghe* >> Software Engineer | Platform TG >> WSO2, Inc. | http://wso2.com >> lean. enterprise. middleware >> >> Mobile : +94718314517 >> Blog : gayashan-a.blogspot.com >> > > > > -- > *Kishanthan Thangarajah* > Senior Software Engineer, > Platform Technologies Team, > WSO2, Inc. > lean.enterprise.middleware > > Mobile - +94773426635 > Blog - *http://kishanthan.wordpress.com <http://kishanthan.wordpress.com>* > Twitter - *http://twitter.com/kishanthan <http://twitter.com/kishanthan>* > -- *Gayashan Amarasinghe* Software Engineer | Platform TG WSO2, Inc. | http://wso2.com lean. enterprise. middleware Mobile : +94718314517 Blog : gayashan-a.blogspot.com
_______________________________________________ Dev mailing list [email protected] http://wso2.org/cgi-bin/mailman/listinfo/dev
