Hi,

(sorry to reply to my own email!)

On 28 Apr 2014, at 11:42, Dave Scott <dave.sc...@citrix.com> wrote:

> 
> Hi Mike,
> 
> On 28 Apr 2014, at 04:44, Mike Tutkowski <mike.tutkow...@solidfire.com> wrote:
> 
>> Hi,
>> 
>> I recently installed 6.2 with XS62ESP1 and XS62ESP1004 (so that
>> Xenserver625StorageProcessor would be utilized).
>> 
>> When I create a cloud from scratch, my SSVM starts up fine, but CPVM ends
>> up in the Paused state. I have to force a shutdown of that VM and then
>> CloudStack restarts it and it works. This consistently happens. The system
>> VMs are being deployed to the local storage of the one XS host I have in my
>> one and only cluster.
>> 
>> Any thoughts on that?
> 
> I’m seeing the same symptom on my test cloud with 6.2 and XS62ESP1004. I 
> think there’s a problem with XenAPI session and task handling in the 
> cloudstack master branch, although I’ve not tracked it down yet. In my 
> management server log I see:
> 
> WARN  [c.c.h.x.r.CitrixResourceBase] (DirectAgent-5:ctx-47dccee1) Unable to 
> start VM(v-2-VM) on host(1c4a31e9-469e-45c3-a0ad-9792ac7b
> 20f6) due to You gave an invalid session reference.  It may have been 
> invalidated by a server restart, or timed out.  You should get 
> a new session handle, using one of the session.login_ calls.  This error does 
> not invalidate the current connection.  The handle para
> meter echoes the bad value given.
> You gave an invalid session reference.  It may have been invalidated by a 
> server restart, or timed out.  You should get a new session
> handle, using one of the session.login_ calls.  This error does not 
> invalidate the current connection.  The handle parameter echoes 
> the bad value given.
>        at com.xensource.xenapi.Types.checkResponse(Types.java:218)
>        at com.xensource.xenapi.Connection.dispatch(Connection.java:395)
>        at 
> com.cloud.hypervisor.xen.resource.XenServerConnectionPool$XenServerConnection.dispatch(XenServerConnectionPool.java:463)
>        at com.xensource.xenapi.Event.from(Event.java:270)
>        at 
> org.apache.cloudstack.hypervisor.xenserver.XenServerResourceNewBase.waitForTask(XenServerResourceNewBase.java:113)
>        at 
> com.cloud.hypervisor.xen.resource.CitrixResourceBase.startVM(CitrixResourceBase.java:3455)
> 
> Somehow the XenAPI session being used by the Event.from in the 
> XenServerResourceNewBase.waitForTask (used for recent 6.2 XenServers only) is 
> being logged-out somewhere. When this happens, the cloudstack cleanup code 
> calls Task.cancel and Task.destroy, and then the XenServer Async.VM.start 
> fails trying to update Task.progress before it internally calls VM.unpause.
> 
> I made a hack to disable caching of Connection/sessions:
> 
> https://github.com/djs55/cloudstack/commit/a388b71279086e42710e26340df0632d0d8135e4

For reference / experimentation, I’ve made a slightly more plausible patch:

https://github.com/djs55/cloudstack/commit/9d40f56c6384d04a5f0fb22e5b97530c0164e0b2

It catches the SESSION_INVALID in the XenServerConnection and transparently 
logs back in. This would prevent the higher level bits of the XenServer plugin 
from having to deal with sessions being expired beneath them.

Chers,
Dave

> 
> I suspect this now leaks Connections/sessions, but the symptom goes away.
> 
> So far my thoughts are:
> 
> 1. we need to find who’s calling session.logout and why — this will help fix 
> the problem in the short term
> 
> 2. The XenServer XenAPI bindings are harder to use than they should be 
> (IMHO). In particular I think the bindings should take care of handling 
> SESSION_INVALID exceptions and re-authenticating transparently, to avoid 
> polluting the cloudstack code with rarely-used exception handlers.
> 
> 3. the semantics of XenAPI task.destroy could be improved: instead of 
> immediately removing the task (which then causes cleanup code to fail 
> randomly it seems), it should be more like Unix waitpid with NOHANG i.e. set 
> a bit which says, “I’m done with this. Destroy it when you are finished with 
> it."
> 
> 
>> 
>> Also, if I try to kick off a user VM to local storage, I get the
>> general-purpose InsufficientCapacityException and the virtual router does
>> not even start up.
> 
> No idea about this one :)
> 
> Cheers,
> Dave
> 
>> 
>> Can anyone create a similar cloud to what I've described here with XS 6.2,
>> XS62ESP1, and XS62ESP1004? I re-ran this test using a XS 6.1 host and it
>> works just fine.
>> 
>> At the moment, this is blocking a test case I'm trying to execute to verify
>> code I had to write in Xenserver625StorageProcessor.
>> 
>> Thanks!
>> 
>> -- 
>> *Mike Tutkowski*
>> *Senior CloudStack Developer, SolidFire Inc.*
>> e: mike.tutkow...@solidfire.com
>> o: 303.746.7302
>> Advancing the way the world uses the
>> cloud<http://solidfire.com/solution/overview/?video=play>
>> *(tm)*
> 

Reply via email to