On 4/14/2016 3:07 PM, Andrew Laski wrote:
On Wed, Apr 13, 2016, at 12:27 PM, Dmitry Stepanenko wrote:
Hi Team,
I worked on nova quota statistics issue
(https://bugs.launchpad.net/nova/+bug/1284424) happenning when nova-*
processes are restarted during removing instances and was able to
reproduce it. For repro I used devstack and started nova-api and
nova-compute in separate screen windows. For killing them I used
ctrl+c. As I found this issue happened if nova-* processes are killed
after instance was deleted but right before quota commit procedure
finishes.
We discussed these results with Markus Zoeller and decided that even
though killing nova processes is a bit exotic event, this still should
be fixed because quotas counting affects billing and very important
for us.
+1. This is very important to get right. And while killing Nova
processes is exotic during normal operation it could happen for upgrades
and that should not cause quota issues.
So, we need to introduce some mechanism that will prevent us from
reaching inconsistent states in terms of quotas. In other words, this
mechanism should work in such a way that both instance create/remove
operation and quota usage recount operation happened or not happened
together.
There's been some discussion around this, and there are other ML threads
somewhat discussing it in the context of moving quota enforcement into a
centralized service/library. There are a couple of approaches that could
be taken for tackling quotas, but a larger issue is that we have no good
way of knowing if some change helps the situation. What we need before
making any changes is a functional test that reproduces the issue.
Once that is in place I would love to see the removal of the
quota_usages table and reservations and have quota be based on actual
usage represented in the instances table. But there are a lot of other
viewpoints and I think work in this area is going to have to start
making small incremental improvements.
Any ideas how to do that properly?
Kind regards,
Dmitry
____________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
I've tried to start that here [1] but it needs work. I have a messier
local version too that was (I think) reproducing a failure, but because
it's a weird race condition mess, it's kind of hard to test and know
when to assert the thing and stop the test.
Maybe I'll just push up the latest WIP of what I have locally and then
someone else can take it over if they want.
[1] https://review.openstack.org/#/c/293800/
--
Thanks,
Matt Riedemann
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev