[jira] [Commented] (LIBCLOUD-799) GCE: list_nodes occasionally failing with ResourceNotFoundError when instances being deleted

Eric Johnson (JIRA) Mon, 01 Feb 2016 12:50:13 -0800

    [ 
https://issues.apache.org/jira/browse/LIBCLOUD-799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15127002#comment-15127002
 ]


Eric Johnson commented on LIBCLOUD-799:
---------------------------------------

Hi Colin,

Thank you for filing this issue.  For the case you describe (concurrent 
deletes), there are many places in libcloud that would exhibit the same kinds 
of errors you're seeing and would generally be very challenging to make robust 
enough to handle all issues.

While putting a try/catch around line 5283 would fix your immediate problem, it 
could actually lead to another bug. Imagine the scenario where the node 
object's extra['disks'] entry contains a boot disk, but does not get populated 
with data from the call in line 5283 (because it had been deleted separately). 
Now, you have an entry in extra['disks'] for the boot disk even though it 
doesn't exist. So a subsequent call to 'destroy_node()' will also fail since 
the boot disk is already deleted.  And I'm sure there other use-cases where 
concurrent operations would cause bugs even if we try to guard against them in 
the driver.

I think a better approach would be to handle the concurrent issues in your code 
since not all users will be doing concurrent or out-of-band requests with 
libcloud.

I'd like to close this issue if you're OK with that.



> GCE: list_nodes occasionally failing with ResourceNotFoundError when 
> instances being deleted
> --------------------------------------------------------------------------------------------
>
>                 Key: LIBCLOUD-799
>                 URL: https://issues.apache.org/jira/browse/LIBCLOUD-799
>             Project: Libcloud
>          Issue Type: Bug
>          Components: Core
>            Reporter: Colin Pitrat
>
> I'm using libcloud version 0.18.0 (version not available in the dropdown list 
> above)
> When listing instances on GCE while I (or another user) concurrently delete 
> instances on the same project, I occasionally get the following exception:
>  File "/usr/lib/python2.7/site-packages/libcloud/compute/drivers/gce.py", 
> line 1601, in list_nodes
>     v.get('instances', [])]
>   File "/usr/lib/python2.7/site-packages/libcloud/compute/drivers/gce.py", 
> line 5283, in _to_node
>     extra['boot_disk'] = self.ex_get_volume(bd['name'], bd['zone'])
>   File "/usr/lib/python2.7/site-packages/libcloud/compute/drivers/gce.py", 
> line 4165, in ex_get_volume
>     response = self.connection.request(request, method='GET').object
>   File "/usr/lib/python2.7/site-packages/libcloud/compute/drivers/gce.py", 
> line 120, in request
>     response = super(GCEConnection, self).request(*args, **kwargs)
>   File "/usr/lib/python2.7/site-packages/libcloud/common/google.py", line 
> 692, in request
>     *args, **kwargs)
>   File "/usr/lib/python2.7/site-packages/libcloud/common/base.py", line 799, 
> in request
>     response = responseCls(**kwargs)
>   File "/usr/lib/python2.7/site-packages/libcloud/common/base.py", line 145, 
> in __init__
>     self.object = self.parse_body()
>   File "/usr/lib/python2.7/site-packages/libcloud/common/google.py", line 
> 263, in parse_body
>     raise ResourceNotFoundError(message, self.status, code)
> libcloud.common.google.ResourceNotFoundError: {u'domain': u'global', 
> u'message': u"The resource 'projects/xxxx/zones/xxxx/disks/xxxx-5802f' was 
> not found", u'reason': u'notFound'}
> I think the exception should be catched in 
> "/usr/lib/python2.7/site-packages/libcloud/compute/drivers/gce.py", line 5283 
> when the volume corresponding to the instance being deleted is not found.
> Regards,
> Colin



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (LIBCLOUD-799) GCE: list_nodes occasionally failing with ResourceNotFoundError when instances being deleted

Reply via email to