Re: [Dev] Fixing Shutdown Errors WSO2 MB.

Ishara Premadasa Mon, 12 May 2014 05:08:44 -0700

Hi Hasitha,

The cassandra service related shut down errors are fixed now by Prabath.
However i noticed few more shut down errors occurring from andes side when
gracefully shutting down cluster nodes, as below.


[2014-05-12 12:58:04,856]  INFO
{org.wso2.andes.server.cluster.ClusterManager} -  Handling cluster gossip:
Node with ID 0 left the cluster
[2014-05-12 12:58:04,856] ERROR
{org.wso2.andes.server.cluster.ClusterManager} -  Error while removing node
details
java.lang.RuntimeException: Error accessing Node details to cassandra
database
    at
org.wso2.andes.server.store.CassandraMessageStore.deleteNodeData(CassandraMessageStore.java:4817)
    at
org.wso2.andes.server.cluster.ClusterManager$NodeExistenceListener.process(ClusterManager.java:609)
    at
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
    at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)

[2014-05-12 14:42:27,481] ERROR
{org.wso2.andes.server.cluster.ClusterManager} -  Error in getting
destination queues from store
org.wso2.andes.AMQStoreException: Error in loading queues [error code 541:
internal error]
    at
org.wso2.andes.server.store.CassandraMessageStore.getDestinationQueues(CassandraMessageStore.java:4024)
    at
org.wso2.andes.server.cluster.ClusterManager.getDestinationQueuesInCluster(ClusterManager.java:805)
    at
org.wso2.andes.server.cluster.GlobalQueueWorker.isThisWorkerShouldWork(GlobalQueueWorker.java:216)
    at
org.wso2.andes.server.cluster.GlobalQueueWorker.run(GlobalQueueWorker.java:75)
    at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)
Caused by: org.wso2.andes.server.store.util.CassandraDataAccessException:
Error while getting data from : QueueDetails
    at
org.wso2.andes.server.store.util.CassandraDataAccessHelper.getStringTypeColumnsInARow(CassandraDataAccessHelper.java:700)
    at
org.wso2.andes.server.store.CassandraMessageStore.getDestinationQueues(CassandraMessageStore.java:4013)
    ... 6 more


The reason seems to be that, when we close the CassandraMessageStore we are
disconnecting the hector client connection with the cluster.
      cluster.getConnectionManager().shutdown();

However after that, there are node existence listeners etc. still get
triggered for the shutdown node and this cause the error logs.

IMO we have to shutdown the connection to cluster after we process all
these operations. Or is there any requirement that we close the hector
client pool in here?

Thanks!



On Thu, May 8, 2014 at 3:38 PM, Ishara Premadasa <ish...@wso2.com> wrote:

> Hi PrabathA et all,
>
> We are still experiencing this issue  as reported in [1] when shutting
> down. As stated by Hasitha it seems Cassandra message service thread is
> shutdown before we perform all data cleaning tasks in MB, as per below
> logs.
>
>
> *[2014-05-08 14:51:15,299]  INFO
> {org.apache.cassandra.net.MessagingService} -  Waiting for messaging
> service to quiesce[2014-05-08 14:51:15,303]  INFO
> {org.apache.cassandra.net.MessagingService} -  MessagingService shutting
> down server thread.*
> [2014-05-08 14:51:15,722]  INFO
> {org.wso2.andes.server.store.CassandraMessageStore} -  Clearing up
> Subscription Information
> [2014-05-08 14:51:15,722]  INFO
> {org.wso2.andes.server.cassandra.DefaultClusteringEnabledSubscriptionManager}
> -  Clearing the Persisted State of Node with ID 0
> [2014-05-08 14:51:15,733] ERROR
> {org.apache.cassandra.thrift.CustomTThreadPoolServer} -  Error occurred
> during processing of message.
> java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has
> shut down
>     at
> org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:61)
>
> Can we please know the progress of fixing this issue, or else can you
> please suggest us what might be the reason for cassandra service getting
> closed before deactivation of andes tasks are completed?
>
> Thanks!
> Ishara
>
> [1] https://wso2.org/jira/browse/MB-301
>
>
> On Mon, May 5, 2014 at 9:11 PM, Hasitha Hiranya <hasit...@wso2.com> wrote:
>
>> Hi,
>>
>> Following is prabath's explanation
>>
>> There's no any impact on Cassandra from the Transport Listener Framework.
>> Carbon transport listener framework opens up the ports used by the
>> transports configured upon the Carbon Server. However, here in this
>> particular scenario, Cassandra daemon is responsible for opening up all the
>> ports used by a given Cassandra Server. Therefore, Cassandra related ports
>> are opened up much later after the transport listener framework is
>> initialised. Similarly, when it comes to shutting down the server, the
>> Cassandra daemon is stopped before we shut down the listener framework.
>>
>> So Prabath, how does this affect the OSGI integration of services? How
>> are we going to hold Cassandra until Andes is closed gracefully?
>>
>> Thanks
>>
>>
>> On Sat, May 3, 2014 at 12:59 PM, Hasitha Hiranya <hasit...@wso2.com>wrote:
>>
>>> Hi,
>>>
>>> Prabath is looking into the issue. Prabath, were you able to find out
>>> the reason why client transports are closed for Cassandra? Can you shed
>>> some light on this?
>>>
>>>  Thanks
>>>
>>>
>>> On Fri, May 2, 2014 at 11:19 AM, Hasitha Hiranya <hasit...@wso2.com>wrote:
>>>
>>>> Looping in PrabathA.
>>>>
>>>>
>>>> On Fri, May 2, 2014 at 11:18 AM, Hasitha Hiranya <hasit...@wso2.com>wrote:
>>>>
>>>>> Hi Shameera,
>>>>>
>>>>> Good catch.
>>>>>
>>>>> [2014-05-02 10:12:19,015]  INFO
>>>>> {org.wso2.carbon.core.ServerManagement} -  Stopped all transport listeners
>>>>> [2014-05-02 10:12:19,015]  INFO
>>>>> {org.wso2.carbon.core.ServerManagement} -  Waiting for request service
>>>>> completion...
>>>>> [2014-05-02 10:12:19,019]  INFO
>>>>> {org.wso2.carbon.core.ServerManagement} -  All requests have been served.
>>>>> [2014-05-02 10:12:19,019]  INFO
>>>>> {org.wso2.carbon.core.ServerManagement} -  Waiting for deployment
>>>>> completion...
>>>>>  [2014-05-02 10:12:19,021]  INFO
>>>>> {org.apache.cassandra.transport.Server} -  Stop listening for CQL clients
>>>>>
>>>>> What happens when all transport listeners are stopped?
>>>>> Stop listening for CQL clients means cassandra will no longer will
>>>>> accept requests from a cql client. Most probably same goes with Hector
>>>>> (Thrift) also. That might cause these issues.
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>> On Fri, May 2, 2014 at 11:12 AM, Shameera Rathnayaka <
>>>>> shame...@wso2.com> wrote:
>>>>>
>>>>>> Hi HasithaH,
>>>>>>
>>>>>> In the  shutdown logs i could see following line, before start andes
>>>>>> deactivation,  what does actually mean? does it stop cassandra transport
>>>>>> listener?
>>>>>>
>>>>>> [2014-05-02 10:12:19,021]  INFO
>>>>>> {org.apache.cassandra.transport.Server} -  Stop listening for CQL clients
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, May 2, 2014 at 10:26 AM, Hasitha Hiranya 
>>>>>> <hasit...@wso2.com>wrote:
>>>>>>
>>>>>>> Hi Shameera,
>>>>>>>
>>>>>>> I have added logs and tested. Full Log is attached at (
>>>>>>> https://drive.google.com/a/wso2.com/file/d/0B57HoxWKqqNnN2FPRE9FeC0yYXM/edit?usp=sharing).
>>>>>>> Deactivate of andes service is like follows.
>>>>>>>
>>>>>>>     protected void deactivate(ComponentContext ctx) {
>>>>>>>         // Unregister QpidService
>>>>>>>         System.out.println("+++++++++++++++++++Started deactivating
>>>>>>> andes");
>>>>>>>         System.out.println("++++++++++++Unregistering qpid service");
>>>>>>>         try {
>>>>>>>             if (null != qpidService) {
>>>>>>>                 qpidService.unregister();
>>>>>>>             }
>>>>>>>         } catch (Exception e) {}
>>>>>>>         System.out.println("+++++++++++++++++Unregistered
>>>>>>> qpidService");
>>>>>>>         // Shutdown the Qpid broker
>>>>>>>         System.out.println("+++++++++++++++++Shutting down andes");
>>>>>>>         ApplicationRegistry.remove();
>>>>>>>         System.out.println("+++++++++++done shutting down andes");
>>>>>>>         System.out.println("+++++++++++done deactivating of andes
>>>>>>> component");
>>>>>>>     }
>>>>>>>
>>>>>>> +++++++++++++++++++Started deactivating andes
>>>>>>> ++++++++++++Unregistering qpid service
>>>>>>> +++++++++++++++++Unregistered qpidService
>>>>>>> +++++++++++++++++Shutting down andes
>>>>>>> +++++++++++done shutting down andes
>>>>>>> +++++++++++done deactivating of andes component
>>>>>>> ++++++++++++++++++++started deactivating cassandra
>>>>>>> ++++++++++++++++++done deactivating cassandra
>>>>>>>
>>>>>>> I have a doubt like is it correct to unregister qpidService before
>>>>>>> actually shutting down the broker?
>>>>>>> Then I changed the code swapping the order.
>>>>>>>
>>>>>>>     protected void deactivate(ComponentContext ctx) {
>>>>>>>         // Unregister QpidService
>>>>>>>         // Shutdown the Qpid broker
>>>>>>>         ApplicationRegistry.remove();
>>>>>>>         try {
>>>>>>>             if (null != qpidService) {
>>>>>>>                 qpidService.unregister();
>>>>>>>             }
>>>>>>>         } catch (Exception e) {}
>>>>>>>     }
>>>>>>>
>>>>>>>
>>>>>>> Still errors happened. Order was as follows.
>>>>>>>
>>>>>>> +++++++++++++++++++Started deactivating andes
>>>>>>> +++++++++++++++++++++shutting down andes
>>>>>>> +++++++++++done shutting down andes
>>>>>>> unregistering qpidservice
>>>>>>>  +++++++++++++++++Unregistered qpidService
>>>>>>> +++++++++++done deactivating of andes component
>>>>>>> ++++++++++++++++++++started deactivating cassandra
>>>>>>> ++++++++++++++++++done deactivating cassandra
>>>>>>>
>>>>>>> Pom file has cassandra as a dependency.
>>>>>>>
>>>>>>>                         <Import-Package>
>>>>>>>                             org.apache.axis2.*;
>>>>>>> version="${axis2.osgi.version.range.qpid}",
>>>>>>>                             org.apache.axiom.*;
>>>>>>> version="${axiom.osgi.version.range.qpid}",
>>>>>>>
>>>>>>> org.wso2.carbon.andes.authentication.service,
>>>>>>>                             org.wso2.carbon.andes.commons,
>>>>>>>                             org.wso2.carbon.andes.commons.registry,
>>>>>>>                          *   org.wso2.carbon.cassandra.server;
>>>>>>> version="4.2.2",*
>>>>>>>                             *;resolution:=optional
>>>>>>>                         </Import-Package>
>>>>>>>
>>>>>>> What is going wrong?
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>>
>>>>>>> On Fri, May 2, 2014 at 9:33 AM, Shameera Rathnayaka <
>>>>>>> shame...@wso2.com> wrote:
>>>>>>>
>>>>>>>> Hi HasithaH,
>>>>>>>>
>>>>>>>> Shall we try with log messages to identify service deactivation and
>>>>>>>> bundle undeployment order of andes and cassandra ?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Shameera.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, May 2, 2014 at 9:18 AM, Hasitha Hiranya 
>>>>>>>> <hasit...@wso2.com>wrote:
>>>>>>>>
>>>>>>>>> During testing I followed following steps.
>>>>>>>>>
>>>>>>>>> 1. create a topic subscriber
>>>>>>>>> 2. publish 1000 msgs
>>>>>>>>> 3. wait until the subscriber get 1000 messages and close
>>>>>>>>> 4. now underneath MB will still be leisurely deleting the content
>>>>>>>>> of removed messages (with timeouts etc)
>>>>>>>>> 5. I shutdown the broker by Ctrl+c
>>>>>>>>> 6. Now with my above fixes it will delete all records that needs
>>>>>>>>> to be deleted before shutting down.
>>>>>>>>>
>>>>>>>>> I can see when the code is at step 6 MB is saying cassandra is
>>>>>>>>> down.
>>>>>>>>> Thus before returning from the Close() of message store (hence
>>>>>>>>> before returning from deactivte of andes service), cassandra service 
>>>>>>>>> get
>>>>>>>>> disappeared. It boils down to an OSGI problem.
>>>>>>>>>
>>>>>>>>> @Shameera,
>>>>>>>>>
>>>>>>>>> I have the dependency to the cassandra bundle as you have
>>>>>>>>> suggested in the andes bundle. But seems there is a problem still. 
>>>>>>>>> Any idea
>>>>>>>>> why that happens?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, May 1, 2014 at 10:56 AM, Hasitha Hiranya <
>>>>>>>>> hasit...@wso2.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> Also in order to stop connection to Cassandra gracefully, we need
>>>>>>>>>> to do following.
>>>>>>>>>>
>>>>>>>>>>         cluster.getConnectionManager().shutdown();
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, May 1, 2014 at 10:52 AM, Hasitha Hiranya <
>>>>>>>>>> hasit...@wso2.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> I intend to cleanup graceful shutdown code of WSO2 Message
>>>>>>>>>>> Broker in following way. We have to do them as a part of fixing 
>>>>>>>>>>> shutdown
>>>>>>>>>>> errors. We have managed to keep Cassandra until broker service 
>>>>>>>>>>> shutdown
>>>>>>>>>>> properly in OSGI env, but we see problems due to lack of these.
>>>>>>>>>>>
>>>>>>>>>>> 1. When shutting down we have to flush
>>>>>>>>>>> all pubSubMessageContentRemoverTasks, meaning we have to delete all 
>>>>>>>>>>> acked
>>>>>>>>>>> messages for topics, otherwise they will never be removed again 
>>>>>>>>>>> (After
>>>>>>>>>>> shutting down memory is gone). Concern is we have to wait for 
>>>>>>>>>>> timeout for
>>>>>>>>>>> those messages to happen, which will cause shutting down of MB on 
>>>>>>>>>>> hold
>>>>>>>>>>> untill all messages are timed out. For now MB will shut down hoping 
>>>>>>>>>>> some
>>>>>>>>>>> other node will clear them up.
>>>>>>>>>>>
>>>>>>>>>>> 2. Above argument goes with content removal tasks as well.
>>>>>>>>>>> Merely stopping deletion thread will not help.
>>>>>>>>>>>
>>>>>>>>>>> 3. above two tasks should be done AFTER stopping queue/topic
>>>>>>>>>>> flusher threads.
>>>>>>>>>>>
>>>>>>>>>>> 4. When shutting down we have to clear in-memory message status
>>>>>>>>>>> (for message count to be correct).
>>>>>>>>>>>
>>>>>>>>>>> 5. We have to copy back NQ messages back to GQ.
>>>>>>>>>>>
>>>>>>>>>>> 6. Flush message counts.
>>>>>>>>>>>
>>>>>>>>>>> @pamod,
>>>>>>>>>>>
>>>>>>>>>>> You have a fix to flush the message count before shutdown (As we
>>>>>>>>>>> update it per message chunks). Is it committed? If so, where is the 
>>>>>>>>>>> code?
>>>>>>>>>>> It should come as point 6.
>>>>>>>>>>>
>>>>>>>>>>> Apart from point 6 have have done other. Testing now.
>>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> *Hasitha Abeykoon*
>>>>>>>>>>> Software Engineer; WSO2, Inc.; http://wso2.com
>>>>>>>>>>> *cell:* *+94 719363063*
>>>>>>>>>>> *blog: **abeykoon.blogspot.com* <http://abeykoon.blogspot.com>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> *Hasitha Abeykoon*
>>>>>>>>>> Software Engineer; WSO2, Inc.; http://wso2.com
>>>>>>>>>> *cell:* *+94 719363063*
>>>>>>>>>> *blog: **abeykoon.blogspot.com* <http://abeykoon.blogspot.com>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> *Hasitha Abeykoon*
>>>>>>>>> Senior Software Engineer; WSO2, Inc.; http://wso2.com
>>>>>>>>>  *cell:* *+94 719363063*
>>>>>>>>> *blog: **abeykoon.blogspot.com* <http://abeykoon.blogspot.com>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> *Software Engineer - WSO2 Inc.*
>>>>>>>> *email: shameera AT wso2.com <shame...@wso2.com> , shameera AT
>>>>>>>> apache.org <shame...@apache.org>*
>>>>>>>> *phone:  +9471 922 1454 <%2B9471%20922%201454>*
>>>>>>>>
>>>>>>>> *Linked in : *
>>>>>>>> http://lk.linkedin.com/pub/shameera-rathnayaka/1a/661/561
>>>>>>>> *Twitter     : *https://twitter.com/Shameera_R
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> *Hasitha Abeykoon*
>>>>>>> Senior Software Engineer; WSO2, Inc.; http://wso2.com
>>>>>>> *cell:* *+94 719363063*
>>>>>>> *blog: **abeykoon.blogspot.com* <http://abeykoon.blogspot.com>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> *Software Engineer - WSO2 Inc.*
>>>>>> *email: shameera AT wso2.com <shame...@wso2.com> , shameera AT
>>>>>> apache.org <shame...@apache.org>*
>>>>>> *phone:  +9471 922 1454 <%2B9471%20922%201454>*
>>>>>>
>>>>>> *Linked in : *
>>>>>> http://lk.linkedin.com/pub/shameera-rathnayaka/1a/661/561
>>>>>> *Twitter     : *https://twitter.com/Shameera_R
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *Hasitha Abeykoon*
>>>>> Senior Software Engineer; WSO2, Inc.; http://wso2.com
>>>>> *cell:* *+94 719363063*
>>>>> *blog: **abeykoon.blogspot.com* <http://abeykoon.blogspot.com>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> *Hasitha Abeykoon*
>>>> Senior Software Engineer; WSO2, Inc.; http://wso2.com
>>>> *cell:* *+94 719363063*
>>>> *blog: **abeykoon.blogspot.com* <http://abeykoon.blogspot.com>
>>>>
>>>>
>>>
>>>
>>> --
>>> *Hasitha Abeykoon*
>>> Senior Software Engineer; WSO2, Inc.; http://wso2.com
>>> *cell:* *+94 719363063*
>>> *blog: **abeykoon.blogspot.com* <http://abeykoon.blogspot.com>
>>>
>>>
>>
>>
>> --
>> *Hasitha Abeykoon*
>> Senior Software Engineer; WSO2, Inc.; http://wso2.com
>> *cell:* *+94 719363063*
>> *blog: **abeykoon.blogspot.com* <http://abeykoon.blogspot.com>
>>
>>
>
>
> --
> Ishara Premasada
> Software Engineer,
> WSO2 Inc. http://wso2.com/
>
>
> *Blog   :  http://isharapremadasa.blogspot.com/
> <http://isharapremadasa.blogspot.com/>Twitter       :
> https://twitter.com/ishadil <https://twitter.com/ishadil> Mobile       :
> +94 714445832 <%2B94%20714445832>*
>
>
>


-- 
Ishara Premasada
Software Engineer,
WSO2 Inc. http://wso2.com/


*Blog   :  http://isharapremadasa.blogspot.com/
<http://isharapremadasa.blogspot.com/>Twitter       :
https://twitter.com/ishadil <https://twitter.com/ishadil>Mobile       : +94
714445832*

_______________________________________________
Dev mailing list
Dev@wso2.org
http://wso2.org/cgi-bin/mailman/listinfo/dev

Re: [Dev] Fixing Shutdown Errors WSO2 MB.

Reply via email to