Hi Kishanthan,

On Fri, Nov 7, 2014 at 10:45 AM, Kishanthan Thangarajah <[email protected]
> wrote:

>
>
> On Thu, Nov 6, 2014 at 9:00 PM, Gayashan Amarasinghe <[email protected]>
> wrote:
>
>> Hi Kishanthan, all,
>>
>> This was a tricky situation and i was able to identify the issue and fix
>> it. This was caused by the new hazelcast upgrade.
>>
>> There are two lists of members maintained in the
>> HazelcastGroupManagementAgent. A hazelcast distributed map shared by the
>> cluster which consists of all the members (members map) in the cluster and
>> a connected members list which is maintained per each subdomain in the
>> cluster. When a member leaves the cluster there's a MemberEntryListener and
>> GroupMembershipListener (and some other listeners) that gets notified. The
>> MemberEntryListener gets notified when the members map gets changed. And
>> when a member leaves, in the entryRemoved method of this listener we remove
>> the particular member that just left from the connectedMembers list as
>> well. And the event that it receives (EntryEvent) consists of the member
>> that left. In the current implementation this member is acquired from this
>> EntryEvent as follows,
>>
>> entryEvent.getValue()
>>
>> So in the code we do this,
>>
>> connectedMembers.remove(entryEvent.getValue());
>>
>> In the previous hazelcast version this returned the correct member.
>> However with the new hazelcast version this returns a null value which
>> causes the connected members list not getting updated properly. This is
>> casued by a fix in hazlecast [1] [2].
>>
>> The TenantAwareLoadBalanceEndpoint in the ELB uses this connected members
>> list to get the next application member to serve the incoming request. This
>> was the cause that resulted for the ELB to try sending requests to
>> disconnected members and eventually become non-responsive.
>>
>> As a fix i have identified that we can use the,
>>
>> entryEvent.getOldValue()
>>
>> to acquire the member that just left. (hazelcast issue [1] also suggests
>> to use it)
>>
>> WDYT?
>>
>
> +1, looks like they have fixed the implementation properly and we should
> use the above for member removed event. Good findings :)
> Also I believe this only affects member removed event type and we don't
> have to change any for member added events ?
>

Thanks and ​Yes, this only affects the member removed event. No changes for
the member added events.​


​/Gayashan​


>
>>
>> I have created the JIRA [3] for this issue and will send the PR with the
>> fix.
>>
>> [1] https://github.com/hazelcast/hazelcast/issues/3198
>> [2] https://github.com/hazelcast/hazelcast/issues/3859
>> [3] https://wso2.org/jira/browse/CARBON-15057
>>
>> Thanks.
>> /Gayashan
>>
>> On Wed, Nov 5, 2014 at 5:47 PM, Kishanthan Thangarajah <
>> [email protected]> wrote:
>>
>>> Gayashan, please share your latest findings on this.
>>>
>>> When we see the member left msg, the current member list is updated with
>>> that event (the member gets removed). So above can occur if that is not
>>> happening accordingly. We should also compare the same with and without
>>> hazelcast upgrade.
>>>
>>> On Fri, Oct 31, 2014 at 5:30 PM, Gayashan Amarasinghe <[email protected]
>>> > wrote:
>>>
>>>> Hi all,
>>>>
>>>> For Carbon testing we have a worker-mgt cluster fronted by ELB and
>>>> requests keep coming in from a jmeter client. During this if one (or more)
>>>> of the worker nodes were shutdown, after some time the ELB stops sending
>>>> requests to the nodes and the connection times out. Following log gets
>>>> printed in the ELB.
>>>>
>>>> ​​TID: [0] [ELB] [2014-10-31 06:27:32,517]  INFO
>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} -
>>>> Failed to send message to Member Host:172.31.7.214, Remote Host:null, Port:
>>>> 4100, HTTP:9765, HTTPS:9445, Domain: wso2.as.domain, Sub-domain:worker,
>>>> Active:true . Error Code: 101503
>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint}
>>>> TID: [0] [ELB] [2014-10-31 06:27:32,519]  INFO
>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} -
>>>> Dropping the faulty/unreachable Member with Domain:wso2.as.domain,
>>>> Host:172.31.7.214, Port:4100
>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint}
>>>> TID: [0] [ELB] [2014-10-31 06:27:32,738]  INFO
>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} -
>>>> Failed over to Host:172.31.0.128, Remote Host:null, Port: 4100, HTTP:9763,
>>>> HTTPS:9443, Domain: wso2.as.domain, Sub-domain:worker, Active:true
>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint}
>>>> TID: [0] [ELB] [2014-10-31 06:27:32,740]  WARN
>>>> {org.apache.synapse.transport.passthru.ConnectCallback} -  Connection
>>>> refused or failed for : /172.31.7.214:9765
>>>> {org.apache.synapse.transport.passthru.ConnectCallback}
>>>> TID: [0] [ELB] [2014-10-31 06:27:32,743]  INFO
>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} -
>>>> Failed to send message to Member Host:172.31.7.214, Remote Host:null, Port:
>>>> 4100, HTTP:9765, HTTPS:9445, Domain: wso2.as.domain, Sub-domain:worker,
>>>> Active:true . Error Code: 101503
>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint}
>>>> TID: [0] [ELB] [2014-10-31 06:27:32,745]  INFO
>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} -
>>>> Dropping the faulty/unreachable Member with Domain:wso2.as.domain,
>>>> Host:172.31.7.214, Port:4100
>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint}
>>>> TID: [0] [ELB] [2014-10-31 06:27:33,518]  INFO
>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} -
>>>> Failed over to Host:172.31.7.214, Remote Host:null, Port: 4100, HTTP:9765,
>>>> HTTPS:9445, Domain: wso2.as.domain, Sub-domain:worker, Active:true
>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint}
>>>> TID: [0] [ELB] [2014-10-31 06:27:33,520]  WARN
>>>> {org.apache.synapse.transport.passthru.ConnectCallback} -  Connection
>>>> refused or failed for : /172.31.7.214:9765
>>>> {org.apache.synapse.transport.passthru.ConnectCallback}
>>>> TID: [0] [ELB] [2014-10-31 06:27:33,523]  WARN
>>>> {org.apache.synapse.transport.passthru.ConnectCallback} -  Connection
>>>> refused or failed for : /172.31.7.214:9765
>>>> {org.apache.synapse.transport.passthru.ConnectCallback}
>>>> TID: [0] [ELB] [2014-10-31 06:27:33,744]  INFO
>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} -
>>>> Failed over to Host:172.31.7.214, Remote Host:null, Port: 4100, HTTP:9765,
>>>> HTTPS:9445, Domain: wso2.as.domain, Sub-domain:worker, Active:true
>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint}
>>>> TID: [0] [ELB] [2014-10-31 06:27:33,745]  INFO
>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} -
>>>> Failed to send message to Member Host:172.31.7.214, Remote Host:null, Port:
>>>> 4100, HTTP:9765, HTTPS:9445, Domain: wso2.as.domain, Sub-domain:worker,
>>>> Active:true . Error Code: 101503
>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint}
>>>> TID: [0] [ELB] [2014-10-31 06:27:33,745]  WARN
>>>> {org.apache.synapse.transport.passthru.ConnectCallback} -  Connection
>>>> refused or failed for : /172.31.7.214:9765
>>>> {org.apache.synapse.transport.passthru.ConnectCallback}
>>>> TID: [0] [ELB] [2014-10-31 06:27:33,747]  INFO
>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} -
>>>> Dropping the faulty/unreachable Member with Domain:wso2.as.domain,
>>>> Host:172.31.7.214, Port:4100
>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint}
>>>> TID: [0] [ELB] [2014-10-31 06:27:34,746]  INFO
>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} -
>>>> Failed over to Host:172.31.0.128, Remote Host:null, Port: 4100, HTTP:9763,
>>>> HTTPS:9443, Domain: wso2.as.domain, Sub-domain:worker, Active:true
>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint}
>>>> TID: [0] [ELB] [2014-10-31 06:27:34,748]  INFO
>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} -
>>>> Failed to send message to Member Host:172.31.7.214, Remote Host:null, Port:
>>>> 4100, HTTP:9765, HTTPS:9445, Domain: wso2.as.domain, Sub-domain:worker,
>>>> Active:true . Error Code: 101503
>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint}
>>>> TID: [0] [ELB] [2014-10-31 06:27:34,750]  INFO
>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} -
>>>> Dropping the faulty/unreachable Member with Domain:wso2.as.domain,
>>>> Host:172.31.7.214, Port:4100
>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint}
>>>> TID: [0] [ELB] [2014-10-31 06:27:35,749]  INFO
>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} -
>>>> Failed over to Host:172.31.7.214, Remote Host:null, Port: 4100, HTTP:9765,
>>>> HTTPS:9445, Domain: wso2.as.domain, Sub-domain:worker, Active:true
>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint}
>>>> TID: [0] [ELB] [2014-10-31 06:27:35,750]  WARN
>>>> {org.apache.synapse.transport.passthru.ConnectCallback} -  Connection
>>>> refused or failed for : /172.31.7.214:9765
>>>> {org.apache.synapse.transport.passthru.ConnectCallback}
>>>> TID: [0] [ELB] [2014-10-31 06:28:30,604]  WARN
>>>> {org.apache.synapse.transport.passthru.SourceHandler} -  Connection time
>>>> out after request is read: http-incoming-61
>>>> {org.apache.synapse.transport.passthru.SourceHandler}
>>>> TID: [0] [ELB] [2014-10-31 06:28:32,606]  WARN
>>>> {org.apache.synapse.transport.passthru.SourceHandler} -  Connection time
>>>> out after request is read: http-incoming-65
>>>> {org.apache.synapse.transport.passthru.SourceHandler}
>>>> TID: [0] [ELB] [2014-10-31 06:28:33,608]  WARN
>>>> {org.apache.synapse.transport.passthru.SourceHandler} -  Connection time
>>>> out after request is read: http-incoming-73
>>>> {org.apache.synapse.transport.passthru.SourceHandler}
>>>> TID: [0] [ELB] [2014-10-31 06:28:33,608]  WARN
>>>> {org.apache.synapse.transport.passthru.SourceHandler} -  Connection time
>>>> out after request is read: http-incoming-69
>>>> {org.apache.synapse.transport.passthru.SourceHandler}
>>>> TID: [0] [ELB] [2014-10-31 06:28:33,608]  WARN
>>>> {org.apache.synapse.transport.passthru.SourceHandler} -  Connection time
>>>> out after request is read: http-incoming-64
>>>> {org.apache.synapse.transport.passthru.SourceHandler}
>>>> TID: [0] [ELB] [2014-10-31 06:28:33,609]  WARN
>>>> {org.apache.synapse.transport.passthru.SourceHandler} -  Connection time
>>>> out after request is read: http-incoming-75
>>>> {org.apache.synapse.transport.passthru.SourceHandler}
>>>> TID: [0] [ELB] [2014-10-31 06:28:34,610]  WARN
>>>> {org.apache.synapse.transport.passthru.SourceHandler} -  Connection time
>>>> out after request is read: http-incoming-60
>>>> {org.apache.synapse.transport.passthru.SourceHandler}
>>>> TID: [0] [ELB] [2014-10-31 06:28:34,611]  WARN
>>>> {org.apache.synapse.transport.passthru.SourceHandler} -  Connection time
>>>> out after request is read: http-incoming-62
>>>> {org.apache.synapse.transport.passthru.SourceHandler}
>>>> TID: [0] [ELB] [2014-10-31 06:28:34,611]  WARN
>>>> {org.apache.synapse.transport.passthru.SourceHandler} -  Connection time
>>>> out after request is read: http-incoming-67
>>>> {org.apache.synapse.transport.passthru.SourceHandler}
>>>>
>>>> ​Need to restart ​the ELB to recover from this. Any idea what's going
>>>> on? Is this a known issue? Can provide the full log if needed.
>>>>
>>>> ​Thanks.
>>>> /Gayashan​
>>>>
>>>> --
>>>> *Gayashan Amarasinghe*
>>>> Software Engineer | Platform TG
>>>> WSO2, Inc. | http://wso2.com
>>>> lean. enterprise. middleware
>>>>
>>>> Mobile : +94718314517
>>>> Blog : gayashan-a.blogspot.com
>>>>
>>>
>>>
>>>
>>> --
>>> *Kishanthan Thangarajah*
>>> Senior Software Engineer,
>>> Platform Technologies Team,
>>> WSO2, Inc.
>>> lean.enterprise.middleware
>>>
>>> Mobile - +94773426635
>>> Blog - *http://kishanthan.wordpress.com
>>> <http://kishanthan.wordpress.com>*
>>> Twitter - *http://twitter.com/kishanthan
>>> <http://twitter.com/kishanthan>*
>>>
>>
>>
>>
>> --
>> *Gayashan Amarasinghe*
>> Software Engineer | Platform TG
>> WSO2, Inc. | http://wso2.com
>> lean. enterprise. middleware
>>
>> Mobile : +94718314517
>> Blog : gayashan-a.blogspot.com
>>
>
>
>
> --
> *Kishanthan Thangarajah*
> Senior Software Engineer,
> Platform Technologies Team,
> WSO2, Inc.
> lean.enterprise.middleware
>
> Mobile - +94773426635
> Blog - *http://kishanthan.wordpress.com <http://kishanthan.wordpress.com>*
> Twitter - *http://twitter.com/kishanthan <http://twitter.com/kishanthan>*
>



-- 
*Gayashan Amarasinghe*
Software Engineer | Platform TG
WSO2, Inc. | http://wso2.com
lean. enterprise. middleware

Mobile : +94718314517
Blog : gayashan-a.blogspot.com
_______________________________________________
Dev mailing list
[email protected]
http://wso2.org/cgi-bin/mailman/listinfo/dev

Reply via email to