Folks, what is the current status on this? I saw a few comments in bug<https://bugs.launchpad.net/fuel/+bug/1285449>, but wondering about action items European timezone can take on Monday to continue the path.
Thanks, On Fri, Feb 28, 2014 at 9:58 PM, Dmitry Borodaenko <[email protected] > wrote: > Dear all, > > Please make sure that all discussions that occur elsewhere (this ML > thread, chats, etc.) end up reflected in the LaunchPad bug (even if a > theory is discussed and then eliminated, it's useful to have it > mentioned in the bug so that other people don't repeat the same line > of investigation). I originally emailed fuel-dev@ to only attract > attention to the problem, I did not intend to split the discussion. > > Thanks, > > On Fri, Feb 28, 2014 at 8:35 AM, Matthew Mosesohn > <[email protected]> wrote: > > I started reaching out to our community folks, Dina and Dmitry. > > > > We tried a few variations, but the same result: nova and cinder > > dislike having the AMQP backend shifted from underneath it. > > > > If we remove haproxy and connect directly to RabbitMQ on a virtual IP, > > all nova and cinder services die when we shift the virtual IP to > > another node. Neutron somehow survives and reconnects in about 25 > > seconds and picks up where it left off. > > > > For the record, we're running on 2013.2.2 code. Dmitry Mescheryakov > > asked me to provide a diff of what the RPC code is between neutron and > > cinder to maybe determine why Neutron can resume connections, but > > Cinder surely doesn't. Here is this diff: > > http://paste.openstack.org/show/uXyeYUGxMiAhmcGlK8VZ/ > > > > For more info: > > Errors we see in Cinder logs: > http://pastie.org/private/w8iigjzijfczvsw5ddelwq > > Errors we see in Neutron logs: > http://pastie.org/private/uelxryhbr42jijip0loe2w > > > > In the bug, mentioned earlier in this thread, we have a diagnostic > snapshot. > > > > We're still digging for leads to fix this HA failover issue. > > > > -Matthew > > > > On Fri, Feb 28, 2014 at 1:12 PM, Vladimir Kuklin <[email protected]> > wrote: > >> It will not help if you shut down the controller. The problem is that > you > >> have hanged AMQP sessions which kombu driver does not look to handle > >> correctly. > >> > >> > >> On Fri, Feb 28, 2014 at 1:09 PM, Bogdan Dobrelya < > [email protected]> > >> wrote: > >>> > >>> On 02/28/2014 05:44 AM, Dmitry Borodaenko wrote: > >>> > Team, > >>> > > >>> > Me and Ryan have spent all day investigating > >>> > https://bugs.launchpad.net/fuel/+bug/1285449 > >>> > > >>> > What we have found so far confirms that this is a critical bug that > >>> > absolutely must be resolved before 4.1 is released. I have > documented > >>> > our findings in the bug comments, someone please take over the > >>> > investigation when you come to the office tomorrow morning MSK time. > >>> > > >>> > I have a feeling that once the root cause is found, the fix will be > >>> > low-impact and will involve either change in HAProxy configuration > for > >>> > RabbitMQ, a patch/upgrade of HAProxy or kombu, or something similar. > >>> > But first we need to understand what exactly breaks, and why this > only > >>> > affects some services and not all of them. > >>> > > >>> > Thanks, > >>> > > >>> > >>> Here is recent rabbitMQ discussion quote from the > >>> Fuel-conductors-support team skype chat (RU + translation): > >>> > >>> Wednesday, February 26, 2014 > >>> [4:00:10 PM] Maxim Yefimov: Коллеги, вопрос есть: > >>> (I have a question) > >>> > >>> listen rabbitmq-openstack > >>> bind 192.168.0.2:5672 > >>> balance roundrobin > >>> > >>> server node-1 192.168.0.3:5673 check inter 5000 rise 2 fall 3 > >>> server node-2 192.168.0.4:5673 check inter 5000 rise 2 fall 3 > backup > >>> server node-3 192.168.0.5:5673 check inter 5000 rise 2 fall 3 > backup > >>> > >>> [4:01:01 PM] Maxim Yefimov: Зачем одновременно roundrobin и > >>> active-passive? > >>> (Why do we use roundrobin and active-passive at once for RabbitMQ?) > >>> > >>> [4:01:39 PM] Miroslav Anashkin: Чтобы коннект не рвался > >>> (To make sure the connection wouldn't break) > >>> > >>> [4:02:01 PM] Miroslav Anashkin: У кролика кластер существует строго в > >>> виде мастер-слейв > >>> (RabbitMQ clustering is restricted to master-slave only) > >>> > >>> [4:02:23 PM] Miroslav Anashkin: Соответственно даже если какая-то нода > с > >>> запросом к слейву придет - та его на мастер отправит > >>> (Hence, any node's query to the RabbitMQ slave would have been re-sent > >>> to the master) > >>> > >>> [4:02:52 PM] Miroslav Anashkin: Поэтому сделали так чтобы ХАПрокси > >>> всегда всех посылал на одну ноду > >>> (Thats why HAproxy always redirects all queries to the single RabbitMQ > >>> node) > >>> > >>> And I'm not clear with this explanation, honestly. Why couldn't we make > >>> OS establish direct connections to arbitrary (LB) chosen RabbitMQ nodes > >>> skipping HAproxy at all? (because of this: "any node's query to the > >>> RabbitMQ slave would have been re-sent to the master") > >>> > >>> Could that resolve the issue? I think I will investigate this option as > >>> well. > >>> > >>> > >>> -- > >>> Best regards, > >>> Bogdan Dobrelya, > >>> Skype #bogdando_at_yahoo.com > >>> Irc #bogdando > >>> > >>> -- > >>> Mailing list: https://launchpad.net/~fuel-dev > >>> Post to : [email protected] > >>> Unsubscribe : https://launchpad.net/~fuel-dev > >>> More help : https://help.launchpad.net/ListHelp > >> > >> > >> > >> > >> -- > >> Yours Faithfully, > >> Vladimir Kuklin, > >> Senior Deployment Engineer, > >> Mirantis, Inc. > >> +7 (495) 640-49-04 > >> +7 (926) 702-39-68 > >> Skype kuklinvv > >> 45bk3, Vorontsovskaya Str. > >> Moscow, Russia, > >> www.mirantis.com > >> www.mirantis.ru > >> [email protected] > >> > >> -- > >> Mailing list: https://launchpad.net/~fuel-dev > >> Post to : [email protected] > >> Unsubscribe : https://launchpad.net/~fuel-dev > >> More help : https://help.launchpad.net/ListHelp > >> > > > > -- > > Mailing list: https://launchpad.net/~fuel-dev > > Post to : [email protected] > > Unsubscribe : https://launchpad.net/~fuel-dev > > More help : https://help.launchpad.net/ListHelp > > > > -- > Dmitry Borodaenko > > -- > Mailing list: https://launchpad.net/~fuel-dev > Post to : [email protected] > Unsubscribe : https://launchpad.net/~fuel-dev > More help : https://help.launchpad.net/ListHelp > -- Mike Scherbakov #mihgen
-- Mailing list: https://launchpad.net/~fuel-dev Post to : [email protected] Unsubscribe : https://launchpad.net/~fuel-dev More help : https://help.launchpad.net/ListHelp

