Hi Mike,
On 28 Apr 2014, at 04:44, Mike Tutkowski <[email protected]> wrote:
> Hi,
>
> I recently installed 6.2 with XS62ESP1 and XS62ESP1004 (so that
> Xenserver625StorageProcessor would be utilized).
>
> When I create a cloud from scratch, my SSVM starts up fine, but CPVM ends
> up in the Paused state. I have to force a shutdown of that VM and then
> CloudStack restarts it and it works. This consistently happens. The system
> VMs are being deployed to the local storage of the one XS host I have in my
> one and only cluster.
>
> Any thoughts on that?
I’m seeing the same symptom on my test cloud with 6.2 and XS62ESP1004. I think
there’s a problem with XenAPI session and task handling in the cloudstack
master branch, although I’ve not tracked it down yet. In my management server
log I see:
WARN [c.c.h.x.r.CitrixResourceBase] (DirectAgent-5:ctx-47dccee1) Unable to
start VM(v-2-VM) on host(1c4a31e9-469e-45c3-a0ad-9792ac7b
20f6) due to You gave an invalid session reference. It may have been
invalidated by a server restart, or timed out. You should get
a new session handle, using one of the session.login_ calls. This error does
not invalidate the current connection. The handle para
meter echoes the bad value given.
You gave an invalid session reference. It may have been invalidated by a
server restart, or timed out. You should get a new session
handle, using one of the session.login_ calls. This error does not invalidate
the current connection. The handle parameter echoes
the bad value given.
at com.xensource.xenapi.Types.checkResponse(Types.java:218)
at com.xensource.xenapi.Connection.dispatch(Connection.java:395)
at
com.cloud.hypervisor.xen.resource.XenServerConnectionPool$XenServerConnection.dispatch(XenServerConnectionPool.java:463)
at com.xensource.xenapi.Event.from(Event.java:270)
at
org.apache.cloudstack.hypervisor.xenserver.XenServerResourceNewBase.waitForTask(XenServerResourceNewBase.java:113)
at
com.cloud.hypervisor.xen.resource.CitrixResourceBase.startVM(CitrixResourceBase.java:3455)
Somehow the XenAPI session being used by the Event.from in the
XenServerResourceNewBase.waitForTask (used for recent 6.2 XenServers only) is
being logged-out somewhere. When this happens, the cloudstack cleanup code
calls Task.cancel and Task.destroy, and then the XenServer Async.VM.start fails
trying to update Task.progress before it internally calls VM.unpause.
I made a hack to disable caching of Connection/sessions:
https://github.com/djs55/cloudstack/commit/a388b71279086e42710e26340df0632d0d8135e4
I suspect this now leaks Connections/sessions, but the symptom goes away.
So far my thoughts are:
1. we need to find who’s calling session.logout and why — this will help fix
the problem in the short term
2. The XenServer XenAPI bindings are harder to use than they should be (IMHO).
In particular I think the bindings should take care of handling SESSION_INVALID
exceptions and re-authenticating transparently, to avoid polluting the
cloudstack code with rarely-used exception handlers.
3. the semantics of XenAPI task.destroy could be improved: instead of
immediately removing the task (which then causes cleanup code to fail randomly
it seems), it should be more like Unix waitpid with NOHANG i.e. set a bit which
says, “I’m done with this. Destroy it when you are finished with it."
>
> Also, if I try to kick off a user VM to local storage, I get the
> general-purpose InsufficientCapacityException and the virtual router does
> not even start up.
No idea about this one :)
Cheers,
Dave
>
> Can anyone create a similar cloud to what I've described here with XS 6.2,
> XS62ESP1, and XS62ESP1004? I re-ran this test using a XS 6.1 host and it
> works just fine.
>
> At the moment, this is blocking a test case I'm trying to execute to verify
> code I had to write in Xenserver625StorageProcessor.
>
> Thanks!
>
> --
> *Mike Tutkowski*
> *Senior CloudStack Developer, SolidFire Inc.*
> e: [email protected]
> o: 303.746.7302
> Advancing the way the world uses the
> cloud<http://solidfire.com/solution/overview/?video=play>
> *(tm)*