I've got this now fixed in my deployment. I'm thinking I've stumbled into a
bug in CS 4.0.1? I'll explain.

When I did the initial zone setup the NFS export being used for secondary
storage didn't work so I ended up flipping through a couple of our NFS
servers and finally got it working. So the system templates, default
template and system VMs deployed without issue.

Fast forward to now and I decided to change the NFS export URL. I made the
needed changes on the cloud database in the host table and then destroyed
the SSVM to have it redeploy. This in turn started my issue were the Zone
was complaining that the template wasn't ready for deployment. Apparently
the MS assumes that the first system template id in cloud.template_host_ref
is the correct template to deploy from. Due to my initial misconfiguration
I had two of these system template ids:

Error seen in the management-server.log:
2013-03-08 14:44:55,815 DEBUG
[storage.secondary.SecondaryStorageManagerImpl] (secstorage-1:null) Zone
host is ready, but secondar
y storage vm template: 1 is not ready on secondary storage: 6

Lets look at the system templates in the DB:
mysql> select
template_id,install_path,download_pct,download_state,error_str,destroyed
from cloud_old.template_host_ref where url = "
http://download.cloud.com/templates/acton/acton-systemvm-02062012.vhd.bz2";;
+-------------+--------------------+--------------+----------------+-----------+-----------+
| template_id | install_path       | download_pct | download_state |
error_str | destroyed |
+-------------+--------------------+--------------+----------------+-----------+-----------+
|           9 | template/tmpl/1/9/ |          100 | DOWNLOADED     | NULL
   |         0 |
|           1 | NULL               |            0 | NOT_DOWNLOADED |
    |         0 |
+-------------+--------------------+--------------+----------------+-----------+-----------+

I'm guessing this destroyed flag needs to be set to 1. Not sure why this
wasn't set proper during installation.

So the fix was to set the install_path in cloud.template_host_ref to a
proper value as this wan't set properly in my DB:

mysql> update cloud.template_host_ref set install_path =
"template/tmpl/1/1" where template_id = 1;
mysql> update cloud.template_host_ref set download_pct = "100" where
template_id = 1;
mysql> update cloud.template_host_ref set download_state = "DOWNLOADED"
where template_id = 1;
mysql> update cloud.template_host_ref set error_str = NULL where
template_id = 1;

Once I made this update, I restarted the cloud-management daemon on my MS
and my System VMs redeployed.

In hindsight, I suppose I could have just set the destroyed value for
template_id 1 to 1 and that may have had the same effect.



On Fri, Mar 8, 2013 at 2:50 PM, Jason Davis <scr...@gmail.com> wrote:

> Restarted the MS and tailed management-server.log. Unfortunately I'm not
> seeing any smoking guns in the logs to explain why my install is being
> cranky:
>
> 2013-03-08 14:44:55,815 DEBUG
> [storage.secondary.SecondaryStorageManagerImpl] (secstorage-1:null) Zone
> host is ready, but secondar
> y storage vm template: 1 is not ready on secondary storage: 6
> 2013-03-08 14:44:55,817 DEBUG
> [storage.secondary.SecondaryStorageManagerImpl] (secstorage-1:null) Zone 1
> is not ready to launch se
> condary storage VM yet
> 2013-03-08 14:44:56,091 DEBUG [cloud.consoleproxy.ConsoleProxyManagerImpl]
> (consoleproxy-1:null) Zone host is ready, but console p
> roxy template: 1 is not ready on secondary storage: 6
> 2013-03-08 14:44:56,092 DEBUG [cloud.consoleproxy.ConsoleProxyManagerImpl]
> (consoleproxy-1:null) Zone 1 is not ready to launch con
> sole proxy yet
>
>
> Although I did catch an interesting error when the MS was talking the the
> agent on my HV host:
>
> 2013-03-08 14:44:05,628 INFO  [xen.discoverer.XcpServerDiscoverer]
> (AgentTaskPool-1:null) Host: CS-XS-01 connected with hypervisor
>  type: XenServer. Checking CIDR...
> 2013-03-08 14:44:05,782 DEBUG [cloud.resource.ResourceState]
> (AgentTaskPool-1:null) Resource state update: [id = 3; name = CS-XS-0
> 1; old state = Enabled; event = InternalCreated; new state = Enabled]
> 2013-03-08 14:44:05,783 DEBUG [cloud.host.Status] (AgentTaskPool-1:null)
> Transition:[Resource state = Enabled, Agent event = Agent
> Connected, Host id = 3, name = CS-XS-01]
> 2013-03-08 14:44:05,805 DEBUG [cloud.host.Status] (AgentTaskPool-1:null)
> Agent status update: [id = 3; name = CS-XS-01; old status
>  = Disconnected; event = AgentConnected; new status = Connecting; old
> update count = 21; new update count = 22]
> 2013-03-08 14:44:05,808 DEBUG [agent.manager.ClusteredAgentManagerImpl]
> (AgentTaskPool-1:null) create ClusteredDirectAgentAttache
> for 3
> 2013-03-08 14:44:05,809 INFO  [agent.manager.DirectAgentAttache]
> (AgentTaskPool-1:null) StartupAnswer received 3 Interval = 60
> 2013-03-08 14:44:05,819 DEBUG [agent.manager.AgentManagerImpl]
> (AgentTaskPool-1:null) Sending Connect to listener: XcpServerDiscov
> erer$$EnhancerByCGLIB$$d3a31083
> 2013-03-08 14:44:05,829 DEBUG [xen.discoverer.XcpServerDiscoverer]
> (AgentTaskPool-1:null) Setting up host 3
> 2013-03-08 14:44:05,839 DEBUG [agent.transport.Request]
> (AgentTaskPool-1:null) Seq 3-1184497665: Sending  { Cmd , MgmtId: 34505085
> 8316, via: 3, Ver: v1, Flags: 100111,
> [{"SetupCommand":{"env":{},"multipath":false,"needSetup":false,"wait":0}}] }
> 2013-03-08 14:44:05,840 DEBUG [agent.transport.Request]
> (AgentTaskPool-1:null) Seq 3-1184497665: Executing:  { Cmd , MgmtId: 34505
> 0858316, via: 3, Ver: v1, Flags: 100111,
> [{"SetupCommand":{"env":{},"multipath":false,"needSetup":false,"wait":0}}] }
> 2013-03-08 14:44:05,843 DEBUG [agent.manager.DirectAgentAttache]
> (DirectAgent-1:null) Seq 3-1184497665: Executing request
> 2013-03-08 14:44:06,038 INFO  [xen.resource.CitrixResourceBase]
> (DirectAgent-1:null) Host 147.26.14.170 OpaqueRef:be755447-313a-66
> 24-38e5-69c15735b352: Host 147.26.14.170 is already setup.
> 2013-03-08 14:44:08,719 WARN  [xen.resource.CitrixResourceBase]
> (DirectAgent-1:null) forget SR catch Exception due to
> The server failed to handle your request, due to an internal error.  The
> given message may give details useful for debugging the p
> roblem.
>         at com.xensource.xenapi.Types.checkResponse(Types.java:1510)
>         at com.xensource.xenapi.Connection.dispatch(Connection.java:368)
>         at
> com.cloud.hypervisor.xen.resource.XenServerConnectionPool$XenServerConnection.dispatch(XenServerConnectionPool.java:909
> )
>         at com.xensource.xenapi.PBD.unplug(PBD.java:465)
>         at
> com.cloud.hypervisor.xen.resource.CitrixResourceBase.cleanupTemplateSR(CitrixResourceBase.java:4518)
>         at
> com.cloud.hypervisor.xen.resource.CitrixResourceBase.execute(CitrixResourceBase.java:4544)
>         at
> com.cloud.hypervisor.xen.resource.CitrixResourceBase.executeRequest(CitrixResourceBase.java:485)
>         at
> com.cloud.hypervisor.xen.resource.XenServer56Resource.executeRequest(XenServer56Resource.java:73)
>         at
> com.cloud.agent.manager.DirectAgentAttache$Task.run(DirectAgentAttache.java:191)
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>         at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:722)
> 2013-03-08 14:44:08,857 DEBUG [agent.manager.DirectAgentAttache]
> (DirectAgent-1:null) Seq 3-1184497665: Response Received:
> 2013-03-08 14:44:08,858 DEBUG [agent.transport.Request]
> (DirectAgent-1:null) Seq 3-1184497665: Processing:  { Ans: , MgmtId: 34505
> 0858316, via: 3, Ver: v1, Flags: 110,
> [{"SetupAnswer":{"_reconnect":false,"result":true,"wait":0}}] }
> 2013-03-08 14:44:08,858 DEBUG [agent.transport.Request]
> (AgentTaskPool-1:null) Seq 3-1184497665: Received:  { Ans: , MgmtId: 34505
> 0858316, via: 3, Ver: v1, Flags: 110, { SetupAnswer } }
>
> However this seems to me unrelated.
>
>
> On Fri, Mar 8, 2013 at 2:24 PM, Ahmad Emneina <aemne...@gmail.com> wrote:
>
>> got it, lets see the full management server log. we should be able to
>> find out where the MS isnt cooperating.
>>
>>
>> On Fri, Mar 8, 2013 at 12:19 PM, Jason Davis <scr...@gmail.com> wrote:
>>
>>> Yup that's what I did, however the MS refuses to spin up a fresh copy of
>>> the SSVM.
>>>
>>>
>>> On Fri, Mar 8, 2013 at 2:09 PM, Ahmad Emneina <aemne...@gmail.com>wrote:
>>>
>>>> I believe you also have to destroy the old secondary storage vm. That
>>>> way
>>>> it gets programmed with the new path to mount.
>>>>
>>>>
>>>> On Fri, Mar 8, 2013 at 11:51 AM, Jason Davis <scr...@gmail.com> wrote:
>>>>
>>>> > Sorry for bumping this old thread but...
>>>> >
>>>> > Did you ever get this figured out Andrei? I am running into the exact
>>>> same
>>>> > issue and after some playtime in the DB I can't seem to get this to
>>>> behave.
>>>> >
>>>> >
>>>> > On Mon, Feb 4, 2013 at 4:59 PM, Andrei Mikhailovsky <
>>>> and...@arhont.com
>>>> > >wrote:
>>>> >
>>>> > >
>>>> > >
>>>> > >
>>>> > > >did you make the db changes while the management server was up and
>>>> > > running?
>>>> > > >Have you restarted the management server since making the db
>>>> > > modifications?
>>>> > >
>>>> > > AM: Yes, I've done the change while the management server was
>>>> running,
>>>> > and
>>>> > > restarted it right after the change has been made. I did go back to
>>>> db
>>>> > > after the restart of the management server to make sure the values
>>>> have
>>>> > > been saved in db. they are correct.
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > > On Mon, Feb 4, 2013 at 6:20 AM, Andrei Mikhailovsky <
>>>> and...@arhont.com
>>>> > > >wrote:
>>>> > >
>>>> > > > Hello guys,
>>>> > > >
>>>> > > > I am having an issue with the SSVM not starting after I've
>>>> changed the
>>>> > > URL
>>>> > > > of the secondary storage server. I am running a single instance
>>>> of CS
>>>> > > 4.0.0
>>>> > > > on Centos 6. Here is what I've done:
>>>> > > >
>>>> > > > 1. I've modified the host and host_details tables in the DB to
>>>> change
>>>> > the
>>>> > > > URL of the secondary storage server.
>>>> > > > 2. I've restarted the CS management server
>>>> > > > 3. Logged in to CS gui and made sure the secondary storage server
>>>> shows
>>>> > > > correct details. It did.
>>>> > > > 4. Restarted SSVM and logged in to SSVM and ran the ssvm check
>>>> script.
>>>> > It
>>>> > > > showed that nfs mountpoint is not mounted.
>>>> > > > 5. Verified that SSVM has network and it can reach the nfs
>>>> server. It
>>>> > > did.
>>>> > > > 6. Manually mounted the nfs share using: mount -t nfs -o
>>>> mountproto=tcp
>>>> > > > server:/path /path. That worked as well.
>>>> > > > 7. Restarted SSVM again and ran the check script again. No joy.
>>>> > > > 8. Deleted SSVM server hoping CS would create a new ssvm instance
>>>> and
>>>> > all
>>>> > > > will work okay. The new SSVM is not being created. Log file
>>>> entries
>>>> > show:
>>>> > > >
>>>> > > > ----
>>>> > > >
>>>> > > > 2013-02-04 13:57:19,336 DEBUG
>>>> > > > [storage.secondary.SecondaryStorageManagerImpl]
>>>> (secstorage-1:null)
>>>> > Zone
>>>> > > > host is ready, but secondary storage vm template: 3 is not ready
>>>> on
>>>> > > > secondary storage: 6
>>>> > > > 2013-02-04 13:57:19,336 DEBUG
>>>> > > > [storage.secondary.SecondaryStorageManagerImpl]
>>>> (secstorage-1:null)
>>>> > Zone
>>>> > > 1
>>>> > > > is not ready to launch secondary storage VM yet
>>>> > > >
>>>> > > > 2013-02-04 13:57:19,444 DEBUG
>>>> > > [cloud.consoleproxy.ConsoleProxyManagerImpl]
>>>> > > > (consoleproxy-1:null) Zone host is ready, but console proxy
>>>> template: 3
>>>> > > is
>>>> > > > not ready on secondary storage: 6
>>>> > > > 2013-02-04 13:57:19,444 DEBUG
>>>> > > [cloud.consoleproxy.ConsoleProxyManagerImpl]
>>>> > > > (consoleproxy-1:null) Zone 1 is not ready to launch console proxy
>>>> yet
>>>> > > > 2013-02-04 13:57:19,956 DEBUG
>>>> > > > [network.router.VirtualNetworkApplianceManagerImpl]
>>>> > > > (RouterStatusMonitor-1:null) Found 7 routers.
>>>> > > > 2013-02-04 13:57:23,600 DEBUG [agent.manager.AgentManagerImpl]
>>>> > > > (AgentManager-Handler-8:null) Ping from 21
>>>> > > > 2013-02-04 13:57:35,517 DEBUG [agent.manager.AgentManagerImpl]
>>>> > > > (AgentManager-Handler-13:null) Ping from 20
>>>> > > > 2013-02-04 13:57:41,166 DEBUG [cloud.server.StatsCollector]
>>>> > > > (StatsCollector-1:null) StorageCollector is running...
>>>> > > > 2013-02-04 13:57:41,168 DEBUG [cloud.server.StatsCollector]
>>>> > > > (StatsCollector-1:null) There is no secondary storage VM for
>>>> secondary
>>>> > > > storage host nfs://192.168.169.200/cloudstack-secondary
>>>> > > >
>>>> > > > ----
>>>> > > >
>>>> > > > I do not see any errors or exceptions in the logs. I've even
>>>> rebooted
>>>> > the
>>>> > > > CS management server. Still, no joy ((
>>>> > > >
>>>> > > > I've checked the vm_template table and the template with id 3
>>>> looks
>>>> > okay:
>>>> > > >
>>>> > > > | 3 | routing-3 | SystemVM Template (KVM) |
>>>> > > > 8d335295-558c-4378-839a-f2e816aebb6c | 0 | 0 | SYSTEM | 0 | 64 |
>>>> > > >
>>>> > >
>>>> >
>>>> http://download.cloud.com/templates/acton/acton-systemvm-02062012.qcow2.bz2|QCOW2|2012-10-29
>>>>  23:39:25 | NULL | 1 | 2755de1f9ef2ce4d6f2bee2efbb4da92
>>>> > > > | SystemVM Template (KVM) | 0 | 0 | 15 | 1 | 0 | 1 | 0 | KVM |
>>>> NULL |
>>>> > > NULL
>>>> > > > | 0 |
>>>> > > >
>>>> > > >
>>>> > > > The secondary storage host entry has an Alert status (which could
>>>> cause
>>>> > > > the problem):
>>>> > > >
>>>> > > > | 6 | nfs://192.168.169.200/cloudstack-secondary |
>>>> > > > 8e143df9-580c-481d-9e1d-eadfe7474867 | Alert | SecondaryStorage |
>>>> nfs |
>>>> > > > 255.255.255.0 | 00:19:bb:34:35:1e | 192.168.169.250 |
>>>> 255.255.255.0 |
>>>> > > > 00:19:bb:34:35:1e | NULL | NULL | NULL | NULL | NULL | NULL |
>>>> NULL |
>>>> > > NULL |
>>>> > > > 1 | NULL | NULL | NULL | nfs://
>>>> 192.168.169.200/cloudstack-secondary |
>>>> > > > NULL | None | NULL | 0 | NULL | 4.0.0.20121029120443 |
>>>> > > > 4e31c7b3-9333-3e6f-8a04-86d4bec5b576 | 2064199680 | NULL | nfs://
>>>> > > > 192.168.169.200/cloudstack-secondary | 1 | 0 | 0 | 1319922183 |
>>>> NULL |
>>>> > > > NULL | 2012-10-30 12:31:55 | NULL | 3 | Enabled |
>>>> > > >
>>>> > > >
>>>> > > >
>>>> > > > I am not sure if I can simply change the db entry of the Status
>>>> column
>>>> > > > from Alert to UP? I do not want to loose the secondary storage
>>>> server
>>>> > as
>>>> > > > I've got a bunch of templates, isos and snapshots that I do not
>>>> want to
>>>> > > > recreate. Does anyone know what else to try to get back the SSVM?
>>>> > > >
>>>> > > > Many thanks
>>>> > > >
>>>> > > > Andrei
>>>> > > >
>>>> > > >
>>>> > > >
>>>> > >
>>>> > >
>>>> >
>>>>
>>>
>>>
>>
>

Reply via email to