[ https://issues.apache.org/jira/browse/LIBCLOUD-799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15127002#comment-15127002 ]
Eric Johnson commented on LIBCLOUD-799: --------------------------------------- Hi Colin, Thank you for filing this issue. For the case you describe (concurrent deletes), there are many places in libcloud that would exhibit the same kinds of errors you're seeing and would generally be very challenging to make robust enough to handle all issues. While putting a try/catch around line 5283 would fix your immediate problem, it could actually lead to another bug. Imagine the scenario where the node object's extra['disks'] entry contains a boot disk, but does not get populated with data from the call in line 5283 (because it had been deleted separately). Now, you have an entry in extra['disks'] for the boot disk even though it doesn't exist. So a subsequent call to 'destroy_node()' will also fail since the boot disk is already deleted. And I'm sure there other use-cases where concurrent operations would cause bugs even if we try to guard against them in the driver. I think a better approach would be to handle the concurrent issues in your code since not all users will be doing concurrent or out-of-band requests with libcloud. I'd like to close this issue if you're OK with that. > GCE: list_nodes occasionally failing with ResourceNotFoundError when > instances being deleted > -------------------------------------------------------------------------------------------- > > Key: LIBCLOUD-799 > URL: https://issues.apache.org/jira/browse/LIBCLOUD-799 > Project: Libcloud > Issue Type: Bug > Components: Core > Reporter: Colin Pitrat > > I'm using libcloud version 0.18.0 (version not available in the dropdown list > above) > When listing instances on GCE while I (or another user) concurrently delete > instances on the same project, I occasionally get the following exception: > File "/usr/lib/python2.7/site-packages/libcloud/compute/drivers/gce.py", > line 1601, in list_nodes > v.get('instances', [])] > File "/usr/lib/python2.7/site-packages/libcloud/compute/drivers/gce.py", > line 5283, in _to_node > extra['boot_disk'] = self.ex_get_volume(bd['name'], bd['zone']) > File "/usr/lib/python2.7/site-packages/libcloud/compute/drivers/gce.py", > line 4165, in ex_get_volume > response = self.connection.request(request, method='GET').object > File "/usr/lib/python2.7/site-packages/libcloud/compute/drivers/gce.py", > line 120, in request > response = super(GCEConnection, self).request(*args, **kwargs) > File "/usr/lib/python2.7/site-packages/libcloud/common/google.py", line > 692, in request > *args, **kwargs) > File "/usr/lib/python2.7/site-packages/libcloud/common/base.py", line 799, > in request > response = responseCls(**kwargs) > File "/usr/lib/python2.7/site-packages/libcloud/common/base.py", line 145, > in __init__ > self.object = self.parse_body() > File "/usr/lib/python2.7/site-packages/libcloud/common/google.py", line > 263, in parse_body > raise ResourceNotFoundError(message, self.status, code) > libcloud.common.google.ResourceNotFoundError: {u'domain': u'global', > u'message': u"The resource 'projects/xxxx/zones/xxxx/disks/xxxx-5802f' was > not found", u'reason': u'notFound'} > I think the exception should be catched in > "/usr/lib/python2.7/site-packages/libcloud/compute/drivers/gce.py", line 5283 > when the volume corresponding to the instance being deleted is not found. > Regards, > Colin -- This message was sent by Atlassian JIRA (v6.3.4#6332)