Roman, the current stable/4.1 has some fixes that make this less likely to occur and is the most likely to recover.
That said, I've done some tracing and there are some issues with nova-conductor processing those messages. Some of the times I've seen the compute-node be the issue, other times I've seen nova-conductor be the issue. As of stable/4.1 I've been able to track it down to nova-conductor. AFAICT it receives the message from nova-compute, takes it from the queue, acks the queue, and selects the object from the DB. However after moving nova-compute and nova-conductor log trace level in amqp and sqlalchemey, the issue appears to stop. I've yet to confirm if the cluster state of rabbit changed, or if the change in logging level changed it or something else. On Tue, May 6, 2014 at 12:42 PM, Roman Sokolkov <[email protected]> wrote: > Hello, fuelers. > > I'm using Fuel 4.1A + Havana in HA mode. > > I permanently observe (on other deployments also) issue with stuck > "nova-compute" service. But i think problem is more fundamental and relates > to HA RabbitMQ and OpenStack AMQP driver implementation. > > Symptoms: > > Random nova-compute from time to time marked as "XXX" for a while. > I see that service itself works properly. In logs i see that it sends status > updates to conductor. But actually nothing is sent. > "netstat" shows that all connections to/from rabbit "ESTABLISHED" > rabbitmqctl shows that "compute.node-x" queue synced to all slaves. > nothing has been broken before, i mean rabbitmq cluster, etc. > > Axe style solution: > > /etc/init.d/openstack-nova-compute restart > > So here i've found a lot of interesting stuff (and solutions): > > https://bugs.launchpad.net/oslo.messaging/+bug/856764 > > > My questions are: > > Are there any thoughts particular for Fuel to solve/workaround this issue? > Any fast solution for this in 4.1? Like adjust TCP keep-alive timeouts? > > > -- > Roman Sokolkov, > Deployment Engineer, > Mirantis, Inc. > Skype rsokolkov, > [email protected] > > -- > Mailing list: https://launchpad.net/~fuel-dev > Post to : [email protected] > Unsubscribe : https://launchpad.net/~fuel-dev > More help : https://help.launchpad.net/ListHelp > -- Andrew Mirantis Ceph community -- Mailing list: https://launchpad.net/~fuel-dev Post to : [email protected] Unsubscribe : https://launchpad.net/~fuel-dev More help : https://help.launchpad.net/ListHelp

