[Openstack] Rechecking changes if jenkins fails
Hi all, Is there some way to trigger another check when Jenkins fails because of issues unrelated to the change? For example last time I submitted https://review.openstack.org/14374 Jenkins failed job gate-nova-docs, but that was because some package could not be downloaded properly: 01:31:58 Downloading/unpacking httplib2 (from -r /home/jenkins/workspace/gate-nova-docs/tools/pip-requires (line 20)) 01:31:58 Hash of the package http://pypi.openstack.org/httplib2/httplib2-0.7.6.tar.gz#md5=3f440fff00e0d2d 3c2971693de283cdf (from http://pypi.openstack.org/httplib2/) (md5 HASH object @ 0x1ac73a0) doesn't match the expected hash 3f440fff00e0d2d3c2971693de283cdf! 01:31:58 Bad md5 hash for package http://pypi.openstack.org/httplib2/httplib2-0.7.6.tar.gz#md5=3f440fff00e0d2d 3c2971693de283cdf (from http://pypi.openstack.org/httplib2/) It seems I could not do anything about it apart from waiting for another change to be merged and triggering Rebase change. It's the second time I run into an issue like that. I know some projects workaround it by running checks again after a specific comment containing one word (recheck, rekick, ...). Is there any similar system working / planned for Openstack? Regards, Stanisław Pitucha Cloud Services Hewlett Packard smime.p7s Description: S/MIME cryptographic signature ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Discussion / proposal: deleted column marker
Hi Johannes, I know the names collide here, but since this technique is known as soft-deletes... We need more namespaces :) Thanks for the idea of grepping for read_deleted. Fortunately I think the situation isn't as bad as it would seem. Let me group the places which change read_deleted in the code (many results from grep are comments). Reading only deleted entries, or all: - xenserver (instance) - cleanup tool - I don't do xen, so I'm not sure how useful is it. Anyone? - tests - can be ignored - if there is no functionality, tests can be killed - sqlalchemy api (instance) - fixed ip can reference a deleted instance (tricky situation; from the commit message: It adds a test to verify that the code works with a duplicate deleted floating_ip - this seems very wrong...) - sqlalchemy api (iscsi) - getting deleted iscsi targets which are still referenced by volume - sqlalchemy api (various) - instance migration, s3image, bandwidth, storage manager, flavors (only available from nova-manage) - compute manager (instance) - reaping deleted instances - I can't see why the same logic wouldn't apply if the rows are actually missing (needs analysis, maybe there's a reason) - compute instance_types (flavour) - apparently flavours are usually read even if they're deleted - network manager (instance) - making sure that ips/networks can be removed even if the instance is already deleted So here's what I can see: pretty much all the usage is about deleting instances or making sure parts connected to instances go away if the instance is deleted earlier. It doesn't seem right, but could be progressively fixed. It looks like another state of the instance, which could be integrated into the other state fields. Nothing else uses the deleted column explicitly (unless I missed something - possible). Ips, networks, keys, anything that actually goes away permanently (and doesn't involve a big chain of cleanup events) is never read back once it's marked as deleted. So maybe a better approach is not to remove the deleted column completely, but to start stripping it from places where it's not needed (fixed, floating ips, networks, ssh keys being good candidates). This could be done by creating a new layer over NovaBase and removing the deleted marker from NovaBase itself. Or maybe even by creating a SoftDeleteMixin and applying it to all current models, then removing it where unnecessary? That would keep the current behaviour where it's currently needed, but at the same time it would provide a known migration path to get rid of it. We could start stripping the new layer then table by table and adding unique constraints where they make sense, before trying to tackle the really tricky parts (for instances table, maybe the marker actually makes sense? maybe not? - it's definitely not going to be an easy decision/fix) Regards, Stanisław Pitucha Cloud Services Hewlett Packard -Original Message- From: openstack-bounces+stanislaw.pitucha=hp@lists.launchpad.net [mailto:openstack-bounces+stanislaw.pitucha=hp@lists.launchpad.net] On Behalf Of Johannes Erdfelt Sent: Tuesday, October 02, 2012 6:12 PM To: openstack@lists.launchpad.net Subject: Re: [Openstack] Discussion / proposal: deleted column marker On Tue, Oct 02, 2012, Pitucha, Stanislaw Izaak stanislaw.pitu...@hp.com wrote: Does anyone know why soft-delete is still in place? Are there any reasons it can't / shouldn't be removed at this time? If it's possible to remove it, would you miss it? I'm certainly not a fan of the database soft-delete for many of the same reasons you've described, but there are some places where removing it would require code changes. Off the top of my head would be pretty much anywhere a context sets read_deleted to 'yes' or 'only', which is a surprising number of places now that I've done a grep. I suspect at least a handful of those cases don't need the functionality and others probably use it as a crutch around other problems. JE ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp smime.p7s Description: S/MIME cryptographic signature ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
[Openstack] Discussion / proposal: deleted column marker
Hi all, I'd like to open a discussion on a topic that's been bugging me for a number of reasons - soft deletes (by that I mean marking rows with deleted=1 in the db) and related - actions audit. Some research and speculations first... To be honest I could not find any reason why the feature is there in the first place. Here's the commit that introduced the 'deleted' columns: https://github.com/openstack/nova/commit/ae6905b9f1ef97206ee3c8722cec3b26fc0 64f38 - unfortunately the description says only Refactored orm to support atomic actions. So the guessing part starts here. These are the possible uses for soft-deletion of the database records that I could come up with: 1. safety net (recover data what was deleted by accident) 2. audit / log (preserve the information about past data) 3. some kind of micro-optimisation where update is more useful than deletion - be it speed or ease of handling foreign constraints (or not handling them straight away more likely) 4. ... no... that's all But I think there's a number of issues with that approach. First - what are the issues with the possible uses above. Then - issues that I can see otherwise. Point by point: 1. Soft-deletion probably makes some restoration possible, but I doubt there's much that could be done without full analysis of the situation. Mainly because the database is only about metainformation - the actual data users care about either goes away (ephemeral disks, memory, ...) or not (volumes, networks, ...) and is not recoverable. Since resources like ips and volumes can be just reused in other instances, not all recovery is possible anyway. Most hardcore fixes could be done by reinserting the original/reconstructed data just as easily as verifying what's safe to undelete. Both actions require looking at existing data and locking out information so it doesn't get reused while we're messing with the previous state. 2. Soft-deleted records are not great as a source of old information. This is connected to the previous point - some resources are just reused / rewritten instead of created and deleted. For example there's no record of what happens with old floating ips - the information gets overwritten when the IP is reassigned to the new instance, so the useful bits are gone. 3. This is the only thing I could come up with related to the commit message itself and the support atomic actions part. Maybe it was sometimes easier to mark something as deleted rather than managing and properly ordering deletes of a number of related entries. So with that out of the way, here's a number of issues related to soft-deletes that I run into myself: 4. Indexing all this data on a busy system is getting a bit silly. Unless you do your own cleanup of old entries, you will end up in a situation where looking up instances on a host actually looks through thousands of deleted rows even if only around 20 or so can be live and interesting. I know it's not a huge deal, but still an unnecessary cpu cycle burning. 5. Some things are just not possible to do in a safe and portable way at the moment. For example adding a new network and fixed IPs (there's a bug for that https://bugs.launchpad.net/nova/+bug/755138). I tried to fix this situation, but actually discovered that this is not possible to do using only sessions and with the 'deleted' column in place. There are ways to do it in a specific database (you can lock the whole table in mysql for example), but it's not portable then. The best you can do easily is limit the issue and hope that two inserts in different sessions won't happen at the same time. This could be easily done with an unique constraint if the 'deleted' column wasn't there. I haven't checked, but guess that anything that can be named (and should have a unique name) has the same problem - security groups, keys, instances, ... 6. The amount of data grows pretty quickly in a busy environment. It has to be cleaned up, but due to some constraints, it can't be done easily in one go. Cleanup triggers help here, but that's some additional work that needs maintenance during schema changes. Schema changes themselves get interesting when you're actually spending time converting mostly rows you really don't care about. There were also instances where migration over many steps failed for some reason on very old rows (virtual interface related, can't recall which step was it at the moment). 7. Not directly related, but I'll get back to that in the summary: owners of bigger deployments will either want to or are required to hold some record of various events and customer information. For example to handle security abuse reports, it would be great to know who owned a specific floating IP at a specific moment. So what's my point? Any use case I can find right now is not really improved by the current schema. It doesn't look like there are many benefits, but there are definitely some downsides. Does anyone know why soft-delete is still in place? Are there any
[Openstack] Api startup issue (maybe pastedeploy related)
Hi all, I just tried to install nova from the folsom-2 package and run into an issue with the api server, which I can't really figure out. Tracking the progress with a large number of strategically placed prints I get to the stage where the api (for example metadata) app is being loaded in load_app() (from nova/wsgi.py) Then, going deeper into paste.deploy, I can see the configuration being loaded via the loadobj / loadcontext functions. Finally the context (LoaderContext) is returned and loadobj() tries to return context.create() Here python just stops on a line which tries to access self.object_type. There's no exception, error or segfault. Python just closes down. If I change the create function to: def create(self) : print 'creating context' print 'for object type %s' % (type(object_type),) return the last message I see is creating context. Any attempt to access object_type (even just to get its type) causes a crash (?), but doesn't leave any trace like a coredump. Additionally I run into http://bugs.python.org/issue1596321 but I don't think that's related, since I no threads are started by this point (I think...) I'm running on a clean Ubuntu Precise machine with: python 2.7.3-0ubuntu2 python-pastedeploy 1.5.0-2build1 Does anyone have a good idea how to fix or at least debug it? Regards, Stanisław Pitucha Cloud Services Hewlett Packard ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] profiling nova-api
Just a small warning: since cProfile has its own overhead on entering/exiting every method and uses deterministic rather than statistical profiling, it may misrepresent tiny functions. They will take much more wall-clock time simply due to the profiling overhead * number of executions. I'm not saying cfg is not slow, but in this case it may be worth confirming the actual time spend on each call by doing a minimalistic test with timeit, without any profiler (request the same variable 100 times and see how much it really takes per iteration). Regards, Stanisław Pitucha Cloud Services Hewlett Packard -Original Message- From: openstack-bounces+stanislaw.pitucha=hp@lists.launchpad.net [mailto:openstack-bounces+stanislaw.pitucha=hp@lists.launchpad.net] On Behalf Of Yun Mao Sent: Wednesday, April 11, 2012 9:48 PM To: openstack@lists.launchpad.net Subject: [Openstack] profiling nova-api Hi Stackers, I spent some time looking at nova-api today. Setup: everything-on-one-node devstack, essex trunk. I setup 1 user with 10 tiny VMs. Client: 3 python threads each doing a loop of nova list equivalent for 100 times. So 300 API calls with concurrency=3. how to profile: python -m cProfile -s time /opt/stack/nova/bin/nova-api --flagfile=/etc/nova/nova.conf --logdir=/var/log/nova --nodebug The partial output is attached in the end. Observations: * It takes about 60 seconds to finish. CPU of nova-api is around 70% to 90% * Database access: Each nova list API call will issue 4 db APIs: 3 instance_get_all_by_filters(), 1 instance_fault_get_by_instance_uuids(), so 1200 db API calls total (note: not necessarily 1200 SQL statements, could be more). The 900 instance_get_all_by_filters() calls took 30.2 seconds (i.e. 0.03s each)! The 300 instance_fault_get_by_instance_uuids() calls only took 1.129 seconds (0.004 each). You might think: MySQL sucks. Not so fast. Remember this is a tiny database with only 10 VMs. Profile also shows that the actual _mysql.connection.query() method only took 1.883 seconds in total. So, we pretty much spend 29 seconds out of 60 seconds doing either sqlalchemy stuff or our own wrapper. You can also see from the sheer volume of sqlalchemy library calls involved. * the cfg.py library inefficiency. During 300 API calls, common.cfg.ConfigOpts._get() is called 135005 times! and we paid 2.470 sec for that. Hopefully this is useful for whoever wants to improve the performance of nova-api. Thanks, Yun === 23355694 function calls (22575841 primitive calls) in 77.874 seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 812 25.7250.032 25.7250.032 {method 'poll' of 'select.epoll' objects} 24081.8830.0011.8830.001 {method 'query' of '_mysql.connection' objects} 703801.6670.0007.1870.000 expression.py:2263(corresponding_column) 1350051.2540.0002.4700.000 cfg.py:1058(_get) 410271.0430.0001.9070.000 schema.py:542(__init__) 388021.0080.0001.2190.000 __init__.py:451(format) 1622060.8210.0000.8210.000 util.py:883(values) 15306660.7730.0000.7740.000 {isinstance} 135046/1349600.7160.0001.9190.000 cfg.py:1107(_substitute) 12050.7130.0011.3690.001 base.py:2106(__init__) 1836000.6900.0000.7960.000 interfaces.py:954(_reduce_path) 810020.6870.0002.4920.000 compiler.py:312(visit_label) 388020.6500.0006.0870.000 log.py:227(format) 3192700.6220.0000.7480.000 attributes.py:164(__get__) 890242/8842290.6080.0001.8850.000 {getattr} 405000.6050.0003.1010.000 schema.py:955(_make_proxy) 120783/1207380.6030.0000.6050.000 {method 'sub' of '_sre.SRE_Pattern' objects} 810000.6010.0002.1560.000 interfaces.py:677(create_row_processor) 630000.5900.0000.7070.000 times.py:44(DateTime_or_None) 981020.5880.0000.8860.000 compiler.py:337(visit_column) 6580980.5800.0000.5810.000 {method 'intersection' of 'set' objects} 1098020.5620.0000.5620.000 expression.py:3625(_from_objects) 231610/12020.5510.0005.8130.005 visitors.py:58(_compiler_dispatch) 1440020.5100.0000.6930.000 compiler.py:622(_truncated_identifier) 135005/1349600.4850.0004.8720.000 cfg.py:860(__getattr__) 24080.4630.0001.9420.001 {built-in method fetch_row} 711000.4600.0000.5800.000 strategies.py:121(create_row_processor) 2990310.4370.0000.4370.000 {_codecs.utf_8_decode} 60000.4370.0001.7990.000 models.py:93(iteritems) 36000/90000.4090.0004.7910.001 mapper.py:2146(populate_state) 810020.3930.0001.1040.000
[Openstack] various questinons about connections to rabbitmq
Hi all, Could someone explain what's the relation between the internal threading and number of rabbitmq connections that can exist on a single service? (in diablo final) I'm wondering under what circumstances can I get multiple connections from a single compute or network manager to the rabbitmq server when using kombu library? Can it use any connection pooling mechanism instead? I can see that it's not used by default, since there are multiple connections started from both of those services at the moment. The source of those questions is a strange thing I noticed where the number of open connections just keeps growing in my cluster, finally hitting open files limit on rabbitmq, ... which is a bad thing of course. From the network dump it seems that most idle requests are method:get_instance_nw_info and then service gets the response with addresses, bridges etc. correctly. But each connection like that just stays open for a long time. It seems that sometimes the number of connection spikes up much more than the average - but I didn't get the occasion to capture it. So far all compute and network services have between 1 and 3 connections open each. Did someone see this in their own environment and can give more details on this behavior? Regards, Stanisław Pitucha Cloud Services Hewlett Packard ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Expectations for tracebacks
I think all are bugs. Even if you understand some of them and considers them to be logical, you should not see ugly backtraces. You should see nice log lines any system administrator can read and understand clearly. I agree. There are some other practical reasons for it too: - Exceptions change between releases (line numbers, file names, call stack, etc), so it can be hard to tell sometimes if you're looking at the same thing or different. - Exceptions take multiple lines and may actually contain the reason only in the last one. That means `grep 'CRIT|ERROR' ...` will not show the real message. - It might not be a critical situation actually. Most of exceptions will probably be, but there are also possible cases of couldn't remove X, because it's not there anymore which might be warnings. I'd be happy to see exceptions disappear completely from logs, unless something really unexpected happens - then the stacktrace and message help a lot with debugging. Regards, Stanisław Pitucha Cloud Services Hewlett Packard ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] [Nova] MySQL drivers in DB
I'm not that interested in engines - this does look out of scope, since forcing something here might prevent people from running mysql cluster (ndb engine is required then). I was wondering about indexes though - I was going to submit a patch for it in the future actually. Since big, db-stopping changes are made anyways, is there any good reason for not having indexes included? And alternative which doesn't require to do this together with schema migration would be to have a separate migration for indexes which is completely optional. Not a real migration of course, but just a set of predefined indexes for each migration stage, so at each stage you could do `nova-manage db reindex` and get the official set of indexes applied automatically. Those should effectively ensure that you get no queries without an index. Regards, Stanisław Pitucha Cloud Services Hewlett Packard -Original Message- From: openstack-bounces+stanislaw.pitucha=hp@lists.launchpad.net [mailto:openstack-bounces+stanislaw.pitucha=hp@lists.launchpad.net] On Behalf Of Gabe Westmaas Sent: Wednesday, October 12, 2011 12:55 PM To: Vishvananda Ishaya; Nick Sokolov Cc: openstack@lists.launchpad.net Subject: Re: [Openstack] [Nova] MySQL drivers in DB +1 on modifying it in your environment, this seems outside the scope of migrations, though perhaps listed as a best practice for the deployment process. Similarly, I have been trying to decide what makes sense for indexes. Right now it would be a bad idea to run nova without adding some indexes, but I'm also not sure if this belongs in migrations, I was also thinking this belonged in a best practices document. I'm concerned about that being a part of an upgrade and negatively affecting a deployment that might have millions of rows in a table that to which a migration adds an index. At the same time we have to deal with this when we do other operations on large tables (adding or removing columns, for example), so maybe I shouldn't be so concerned about it. Any thoughts? Gabe On 10/11/11 5:15 PM, Vishvananda Ishaya vishvana...@gmail.com wrote: For some reason tables are getting created as default type. There is a migration in the history to convert tables to InnoDB, but anything created after that migration will go in as the default type. We can add another migration to convert all of the other tables, but I think the right method here might be to set the default table type in mysql to innodb before running nova-manage db sync. Vish On Oct 11, 2011, at 1:55 PM, Nick Sokolov wrote: Hi stackers! I noticed, that tables in database use two database engines instead of two, but model descriptions does not override __table_args__ = {'mysql_engine': 'InnoDB'}. This is design decision or migration_repo bug, or something else? mysql select table_name, table_type, engine FROM information_schema.tables; +---+-++ | table_name| table_type | engine | +---+-++ system tables here | agent_builds | BASE TABLE | MyISAM | | auth_tokens | BASE TABLE | InnoDB | | block_device_mapping | BASE TABLE | MyISAM | | certificates | BASE TABLE | InnoDB | | compute_nodes | BASE TABLE | InnoDB | | console_pools | BASE TABLE | InnoDB | | consoles | BASE TABLE | InnoDB | | export_devices| BASE TABLE | InnoDB | | fixed_ips | BASE TABLE | InnoDB | | floating_ips | BASE TABLE | InnoDB | | instance_actions | BASE TABLE | InnoDB | | instance_metadata | BASE TABLE | InnoDB | | instance_type_extra_specs | BASE TABLE | MyISAM | | instance_types| BASE TABLE | InnoDB | | instances | BASE TABLE | InnoDB | | iscsi_targets | BASE TABLE | InnoDB | | key_pairs | BASE TABLE | InnoDB | | migrate_version | BASE TABLE | InnoDB | | migrations| BASE TABLE | InnoDB | | networks | BASE TABLE | InnoDB | | projects | BASE TABLE | InnoDB | | provider_fw_rules | BASE TABLE | MyISAM | | quotas| BASE TABLE | InnoDB | | security_group_instance_association | BASE TABLE | InnoDB | | security_group_rules | BASE TABLE | InnoDB | | security_groups | BASE TABLE | InnoDB | | services | BASE TABLE | InnoDB | | snapshots | BASE TABLE | InnoDB | |
Re: [Openstack] Nova DB Connection Pooling
The pain starts when your max memory usage crosses what you have available. Check http://dev.mysql.com/doc/refman/5.1/en/memory-use.html - especially comments which calculate the needed memory for N connections for both innodb and isam. (mysqltuner.pl will also calculate that for you) Hundreds of connections should be ok. Thousands... you should rethink it ;) If you want to improve connection pooling, look at the Mysql proxy project. Regards, Stanisław Pitucha Cloud Services Hewlett Packard smime.p7s Description: S/MIME cryptographic signature ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
[Openstack] mocking database and stubs
Hi all, I've got some questions about testing the bits affecting the database. Currently some places use mox, some use db_fake / stubs - this unfortunately fails when I try to add another test: In VlanManager.setup_compute_network(), there's a call to db.network_get_by_instance. I needed to test some part of it that required specific mocking, so I followed what happens in other places of that file - changed db to self.db and used mox. This however made the xenapi tests fail because they rely on db_fake in test_spawn_vlanmanager (it also calls setup_compute_network()). So the way I see it now - my options are: - rewrite xenapi tests to use mox (lots of work, really don't want to go there) - mix mox and db_fake in a single test in test_compute (ugly / don't want to figure out the dependencies there in the future) ... are there any other solution? Is there any blessed way to do this in the future? Is it better to keep to db_fake or mox? Regards, Stanisław Pitucha Cloud Services Hewlett Packard smime.p7s Description: S/MIME cryptographic signature ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp