Excerpts from Davanum Srinivas (dims)'s message of 2015-08-16 17:40:16 -0400: > Doug, > > I've filed https://review.openstack.org/213542 to log error messages. Will > work with oslo.messaging folks the next few days.
Thanks, Dims! > > Thanks, > Dims > > On Fri, Aug 14, 2015 at 6:58 PM, Doug Hellmann <d...@doughellmann.com> > wrote: > > > All patches to oslo.messaging are currently failing the > > gate-tempest-dsvm-neutron-src-oslo.messaging job because the neutron > > service dies. amuller, kevinbenton, and I spent a bunch of time looking at > > it today, and I think we have an issue introduced by some asymmetric gating > > between the two projects. > > > > Neutron has 2 different modes for starting the RPC service, depending on > > the number of workers requested. The problem comes up with rpc_workers=0, > > which is the new default. In that mode, rather than using the > > ProcessLauncher, the RPC server is started directly in the current process. > > That results in wait() being called in a way that violates the new > > constraints being enforced within oslo.messaging after [1] landed. That > > patch is unreleased, so the only project seeing the problem is > > oslo.messaging. I’ve proposed a revert in [2], which passes the gate tests. > > > > I have also added [3] to neutron to see if we can get the gate job to show > > the same error messages I was seeing locally (part of the trouble we’ve had > > with debugging this is the process exits quickly enough that some of the > > log messages are never being written). I’m using [4] as a patch in > > oslo.messaging that was failing before to trigger the job to get the > > necessary log. That patch should *not* be landed, since I don’t think the > > change it reverts is related to the problem, it was just handy for > > debugging. > > > > The error message I see locally, “start/stop/wait must be called in the > > same thread”, is visible in this log snippet [5]. > > > > It’s not clear what the best path forward is. Obviously neutron is doing > > something with the RPC server that oslo.messaging doesn’t expect/want/like, > > but also obviously we can’t release oslo.messaging in its current state and > > break neutron. Someone with a better understanding of both neutron and > > oslo.messaging may be able to fix neutron’s use of the RPC code to avoid > > this case. There may be other users of oslo.messaging with the same > > ‘broken’ pattern, but IIRC neutron is unique in the way it runs both RPC > > and API services in the same process. To be safe, though, it may be better > > to log error messages instead of doing whatever we’re doing now to cause > > the process to exit. We can then set up a log stash search for the error > > message and find other applications that would be broken, fix them, and > > then switch oslo.messaging back to throwing an exception. > > > > I’m going to be at the Ops summit next week, so I need to hand off > > debugging and fixing the issue to someone else on the Oslo team. We created > > an etherpad to track progress and make notes today, and all of these links > > are referenced there, too [6]. > > > > Thanks again to amuller and kevinbenton for the time they spent helping > > with debugging today! > > > > Doug > > > > [1] https://review.openstack.org/#/c/209043/ > > [2] https://review.openstack.org/#/c/213299/ > > [3] https://review.openstack.org/#/c/213360/ > > [4] https://review.openstack.org/#/c/213297/ > > [6] http://paste.openstack.org/show/415030/ > > [6] https://etherpad.openstack.org/p/wm2D6UGZbf > > > > > __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev