Please review and merge the pull request below with the fix,

https://github.com/wso2/carbon4-kernel/pull/73

Thanks.
/Gayashan

On Fri, Nov 7, 2014 at 10:50 AM, Gayashan Amarasinghe <[email protected]>
wrote:

> Hi Kishanthan,
>
> On Fri, Nov 7, 2014 at 10:45 AM, Kishanthan Thangarajah <
> [email protected]> wrote:
>
>>
>>
>> On Thu, Nov 6, 2014 at 9:00 PM, Gayashan Amarasinghe <[email protected]>
>> wrote:
>>
>>> Hi Kishanthan, all,
>>>
>>> This was a tricky situation and i was able to identify the issue and fix
>>> it. This was caused by the new hazelcast upgrade.
>>>
>>> There are two lists of members maintained in the
>>> HazelcastGroupManagementAgent. A hazelcast distributed map shared by the
>>> cluster which consists of all the members (members map) in the cluster and
>>> a connected members list which is maintained per each subdomain in the
>>> cluster. When a member leaves the cluster there's a MemberEntryListener and
>>> GroupMembershipListener (and some other listeners) that gets notified. The
>>> MemberEntryListener gets notified when the members map gets changed. And
>>> when a member leaves, in the entryRemoved method of this listener we remove
>>> the particular member that just left from the connectedMembers list as
>>> well. And the event that it receives (EntryEvent) consists of the member
>>> that left. In the current implementation this member is acquired from this
>>> EntryEvent as follows,
>>>
>>> entryEvent.getValue()
>>>
>>> So in the code we do this,
>>>
>>> connectedMembers.remove(entryEvent.getValue());
>>>
>>> In the previous hazelcast version this returned the correct member.
>>> However with the new hazelcast version this returns a null value which
>>> causes the connected members list not getting updated properly. This is
>>> casued by a fix in hazlecast [1] [2].
>>>
>>> The TenantAwareLoadBalanceEndpoint in the ELB uses this connected
>>> members list to get the next application member to serve the incoming
>>> request. This was the cause that resulted for the ELB to try sending
>>> requests to disconnected members and eventually become non-responsive.
>>>
>>> As a fix i have identified that we can use the,
>>>
>>> entryEvent.getOldValue()
>>>
>>> to acquire the member that just left. (hazelcast issue [1] also suggests
>>> to use it)
>>>
>>> WDYT?
>>>
>>
>> +1, looks like they have fixed the implementation properly and we should
>> use the above for member removed event. Good findings :)
>> Also I believe this only affects member removed event type and we don't
>> have to change any for member added events ?
>>
>
> Thanks and ​Yes, this only affects the member removed event. No changes
> for the member added events.​
>
>
> ​/Gayashan​
>
>
>>
>>>
>>> I have created the JIRA [3] for this issue and will send the PR with the
>>> fix.
>>>
>>> [1] https://github.com/hazelcast/hazelcast/issues/3198
>>> [2] https://github.com/hazelcast/hazelcast/issues/3859
>>> [3] https://wso2.org/jira/browse/CARBON-15057
>>>
>>> Thanks.
>>> /Gayashan
>>>
>>> On Wed, Nov 5, 2014 at 5:47 PM, Kishanthan Thangarajah <
>>> [email protected]> wrote:
>>>
>>>> Gayashan, please share your latest findings on this.
>>>>
>>>> When we see the member left msg, the current member list is updated
>>>> with that event (the member gets removed). So above can occur if that is
>>>> not happening accordingly. We should also compare the same with and without
>>>> hazelcast upgrade.
>>>>
>>>> On Fri, Oct 31, 2014 at 5:30 PM, Gayashan Amarasinghe <
>>>> [email protected]> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> For Carbon testing we have a worker-mgt cluster fronted by ELB and
>>>>> requests keep coming in from a jmeter client. During this if one (or more)
>>>>> of the worker nodes were shutdown, after some time the ELB stops sending
>>>>> requests to the nodes and the connection times out. Following log gets
>>>>> printed in the ELB.
>>>>>
>>>>> ​​TID: [0] [ELB] [2014-10-31 06:27:32,517]  INFO
>>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} -
>>>>> Failed to send message to Member Host:172.31.7.214, Remote Host:null, 
>>>>> Port:
>>>>> 4100, HTTP:9765, HTTPS:9445, Domain: wso2.as.domain, Sub-domain:worker,
>>>>> Active:true . Error Code: 101503
>>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint}
>>>>> TID: [0] [ELB] [2014-10-31 06:27:32,519]  INFO
>>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} -
>>>>> Dropping the faulty/unreachable Member with Domain:wso2.as.domain,
>>>>> Host:172.31.7.214, Port:4100
>>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint}
>>>>> TID: [0] [ELB] [2014-10-31 06:27:32,738]  INFO
>>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} -
>>>>> Failed over to Host:172.31.0.128, Remote Host:null, Port: 4100, HTTP:9763,
>>>>> HTTPS:9443, Domain: wso2.as.domain, Sub-domain:worker, Active:true
>>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint}
>>>>> TID: [0] [ELB] [2014-10-31 06:27:32,740]  WARN
>>>>> {org.apache.synapse.transport.passthru.ConnectCallback} -  Connection
>>>>> refused or failed for : /172.31.7.214:9765
>>>>> {org.apache.synapse.transport.passthru.ConnectCallback}
>>>>> TID: [0] [ELB] [2014-10-31 06:27:32,743]  INFO
>>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} -
>>>>> Failed to send message to Member Host:172.31.7.214, Remote Host:null, 
>>>>> Port:
>>>>> 4100, HTTP:9765, HTTPS:9445, Domain: wso2.as.domain, Sub-domain:worker,
>>>>> Active:true . Error Code: 101503
>>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint}
>>>>> TID: [0] [ELB] [2014-10-31 06:27:32,745]  INFO
>>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} -
>>>>> Dropping the faulty/unreachable Member with Domain:wso2.as.domain,
>>>>> Host:172.31.7.214, Port:4100
>>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint}
>>>>> TID: [0] [ELB] [2014-10-31 06:27:33,518]  INFO
>>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} -
>>>>> Failed over to Host:172.31.7.214, Remote Host:null, Port: 4100, HTTP:9765,
>>>>> HTTPS:9445, Domain: wso2.as.domain, Sub-domain:worker, Active:true
>>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint}
>>>>> TID: [0] [ELB] [2014-10-31 06:27:33,520]  WARN
>>>>> {org.apache.synapse.transport.passthru.ConnectCallback} -  Connection
>>>>> refused or failed for : /172.31.7.214:9765
>>>>> {org.apache.synapse.transport.passthru.ConnectCallback}
>>>>> TID: [0] [ELB] [2014-10-31 06:27:33,523]  WARN
>>>>> {org.apache.synapse.transport.passthru.ConnectCallback} -  Connection
>>>>> refused or failed for : /172.31.7.214:9765
>>>>> {org.apache.synapse.transport.passthru.ConnectCallback}
>>>>> TID: [0] [ELB] [2014-10-31 06:27:33,744]  INFO
>>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} -
>>>>> Failed over to Host:172.31.7.214, Remote Host:null, Port: 4100, HTTP:9765,
>>>>> HTTPS:9445, Domain: wso2.as.domain, Sub-domain:worker, Active:true
>>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint}
>>>>> TID: [0] [ELB] [2014-10-31 06:27:33,745]  INFO
>>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} -
>>>>> Failed to send message to Member Host:172.31.7.214, Remote Host:null, 
>>>>> Port:
>>>>> 4100, HTTP:9765, HTTPS:9445, Domain: wso2.as.domain, Sub-domain:worker,
>>>>> Active:true . Error Code: 101503
>>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint}
>>>>> TID: [0] [ELB] [2014-10-31 06:27:33,745]  WARN
>>>>> {org.apache.synapse.transport.passthru.ConnectCallback} -  Connection
>>>>> refused or failed for : /172.31.7.214:9765
>>>>> {org.apache.synapse.transport.passthru.ConnectCallback}
>>>>> TID: [0] [ELB] [2014-10-31 06:27:33,747]  INFO
>>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} -
>>>>> Dropping the faulty/unreachable Member with Domain:wso2.as.domain,
>>>>> Host:172.31.7.214, Port:4100
>>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint}
>>>>> TID: [0] [ELB] [2014-10-31 06:27:34,746]  INFO
>>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} -
>>>>> Failed over to Host:172.31.0.128, Remote Host:null, Port: 4100, HTTP:9763,
>>>>> HTTPS:9443, Domain: wso2.as.domain, Sub-domain:worker, Active:true
>>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint}
>>>>> TID: [0] [ELB] [2014-10-31 06:27:34,748]  INFO
>>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} -
>>>>> Failed to send message to Member Host:172.31.7.214, Remote Host:null, 
>>>>> Port:
>>>>> 4100, HTTP:9765, HTTPS:9445, Domain: wso2.as.domain, Sub-domain:worker,
>>>>> Active:true . Error Code: 101503
>>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint}
>>>>> TID: [0] [ELB] [2014-10-31 06:27:34,750]  INFO
>>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} -
>>>>> Dropping the faulty/unreachable Member with Domain:wso2.as.domain,
>>>>> Host:172.31.7.214, Port:4100
>>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint}
>>>>> TID: [0] [ELB] [2014-10-31 06:27:35,749]  INFO
>>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint} -
>>>>> Failed over to Host:172.31.7.214, Remote Host:null, Port: 4100, HTTP:9765,
>>>>> HTTPS:9445, Domain: wso2.as.domain, Sub-domain:worker, Active:true
>>>>> {org.wso2.carbon.lb.endpoint.endpoint.TenantAwareLoadBalanceEndpoint}
>>>>> TID: [0] [ELB] [2014-10-31 06:27:35,750]  WARN
>>>>> {org.apache.synapse.transport.passthru.ConnectCallback} -  Connection
>>>>> refused or failed for : /172.31.7.214:9765
>>>>> {org.apache.synapse.transport.passthru.ConnectCallback}
>>>>> TID: [0] [ELB] [2014-10-31 06:28:30,604]  WARN
>>>>> {org.apache.synapse.transport.passthru.SourceHandler} -  Connection time
>>>>> out after request is read: http-incoming-61
>>>>> {org.apache.synapse.transport.passthru.SourceHandler}
>>>>> TID: [0] [ELB] [2014-10-31 06:28:32,606]  WARN
>>>>> {org.apache.synapse.transport.passthru.SourceHandler} -  Connection time
>>>>> out after request is read: http-incoming-65
>>>>> {org.apache.synapse.transport.passthru.SourceHandler}
>>>>> TID: [0] [ELB] [2014-10-31 06:28:33,608]  WARN
>>>>> {org.apache.synapse.transport.passthru.SourceHandler} -  Connection time
>>>>> out after request is read: http-incoming-73
>>>>> {org.apache.synapse.transport.passthru.SourceHandler}
>>>>> TID: [0] [ELB] [2014-10-31 06:28:33,608]  WARN
>>>>> {org.apache.synapse.transport.passthru.SourceHandler} -  Connection time
>>>>> out after request is read: http-incoming-69
>>>>> {org.apache.synapse.transport.passthru.SourceHandler}
>>>>> TID: [0] [ELB] [2014-10-31 06:28:33,608]  WARN
>>>>> {org.apache.synapse.transport.passthru.SourceHandler} -  Connection time
>>>>> out after request is read: http-incoming-64
>>>>> {org.apache.synapse.transport.passthru.SourceHandler}
>>>>> TID: [0] [ELB] [2014-10-31 06:28:33,609]  WARN
>>>>> {org.apache.synapse.transport.passthru.SourceHandler} -  Connection time
>>>>> out after request is read: http-incoming-75
>>>>> {org.apache.synapse.transport.passthru.SourceHandler}
>>>>> TID: [0] [ELB] [2014-10-31 06:28:34,610]  WARN
>>>>> {org.apache.synapse.transport.passthru.SourceHandler} -  Connection time
>>>>> out after request is read: http-incoming-60
>>>>> {org.apache.synapse.transport.passthru.SourceHandler}
>>>>> TID: [0] [ELB] [2014-10-31 06:28:34,611]  WARN
>>>>> {org.apache.synapse.transport.passthru.SourceHandler} -  Connection time
>>>>> out after request is read: http-incoming-62
>>>>> {org.apache.synapse.transport.passthru.SourceHandler}
>>>>> TID: [0] [ELB] [2014-10-31 06:28:34,611]  WARN
>>>>> {org.apache.synapse.transport.passthru.SourceHandler} -  Connection time
>>>>> out after request is read: http-incoming-67
>>>>> {org.apache.synapse.transport.passthru.SourceHandler}
>>>>>
>>>>> ​Need to restart ​the ELB to recover from this. Any idea what's going
>>>>> on? Is this a known issue? Can provide the full log if needed.
>>>>>
>>>>> ​Thanks.
>>>>> /Gayashan​
>>>>>
>>>>> --
>>>>> *Gayashan Amarasinghe*
>>>>> Software Engineer | Platform TG
>>>>> WSO2, Inc. | http://wso2.com
>>>>> lean. enterprise. middleware
>>>>>
>>>>> Mobile : +94718314517
>>>>> Blog : gayashan-a.blogspot.com
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> *Kishanthan Thangarajah*
>>>> Senior Software Engineer,
>>>> Platform Technologies Team,
>>>> WSO2, Inc.
>>>> lean.enterprise.middleware
>>>>
>>>> Mobile - +94773426635
>>>> Blog - *http://kishanthan.wordpress.com
>>>> <http://kishanthan.wordpress.com>*
>>>> Twitter - *http://twitter.com/kishanthan
>>>> <http://twitter.com/kishanthan>*
>>>>
>>>
>>>
>>>
>>> --
>>> *Gayashan Amarasinghe*
>>> Software Engineer | Platform TG
>>> WSO2, Inc. | http://wso2.com
>>> lean. enterprise. middleware
>>>
>>> Mobile : +94718314517
>>> Blog : gayashan-a.blogspot.com
>>>
>>
>>
>>
>> --
>> *Kishanthan Thangarajah*
>> Senior Software Engineer,
>> Platform Technologies Team,
>> WSO2, Inc.
>> lean.enterprise.middleware
>>
>> Mobile - +94773426635
>> Blog - *http://kishanthan.wordpress.com
>> <http://kishanthan.wordpress.com>*
>> Twitter - *http://twitter.com/kishanthan <http://twitter.com/kishanthan>*
>>
>
>
>
> --
> *Gayashan Amarasinghe*
> Software Engineer | Platform TG
> WSO2, Inc. | http://wso2.com
> lean. enterprise. middleware
>
> Mobile : +94718314517
> Blog : gayashan-a.blogspot.com
>



-- 
*Gayashan Amarasinghe*
Software Engineer | Platform TG
WSO2, Inc. | http://wso2.com
lean. enterprise. middleware

Mobile : +94718314517
Blog : gayashan-a.blogspot.com
_______________________________________________
Dev mailing list
[email protected]
http://wso2.org/cgi-bin/mailman/listinfo/dev

Reply via email to