whitetiger264 opened a new issue, #7244:
URL: https://github.com/apache/cloudstack/issues/7244
##### ISSUE TYPE
* Bug Report
##### COMPONENT NAME
~~~
NFS & SSVM
~~~
##### CLOUDSTACK VERSION
~~~
4.17.2
~~~
##### CONFIGURATION
~~~
KVM Host, using the advanced network.
~~~
##### OS / ENVIRONMENT
~~~
KVM Host & CS Host Both AlmaLinux 8
~~~
##### SUMMARY
NFS Secondary Storage Fails to mount in SSVM upon SSVM or CS restart.
##### STEPS TO REPRODUCE
Most like will not be able to reproduce it may only be in my environment
this is caused.
~~~
I have two NFS Secondary storage, and they run on a completely external
network entirely remotely from my CloudStack network. These are their public IP
addresses:
1. NFS One: 102.165.XXX.YYY
2. NFS Two: 102.165.XXX.ZZZ
I have a private network for my cloud stack environment, which is
192.168.50.0/24 and both my management and storage fall within this network
using the same 192.168.50.1 gateway. But keep in mind, my actual NFS storage is
remote via internet public IP.
For months I have been connecting to my NFS secondary storage via public IP
with zero issues until one morning it just failed. The failed mount error in
the logs shows:
```
2023-02-16 10:24:22,423 ERROR [storage.resource.NfsSecondaryStorageResource]
(agentRequest-Handler-2:null) GetRootDir for nfs://nfsip/data/secondary failed
due to com.cloud.utils.exception.CloudRuntimeException: Unable to create local
folder for: /mnt/SecStorage/91de6d1c-4c04-359c-82b5-fdcfe4a83da7 in order to
mount nfs://102.165.XXX.ZZZ/data/secondary
com.cloud.utils.exception.CloudRuntimeException: Unable to create local
folder for: /mnt/SecStorage/91de6d1c-4c04-359c-82b5-fdcfe4a83da7 in order to
mount nfs://102.165.XXX.ZZZ/data/secondary
```
Upon further investigation, I found that when SSVM or CS is rebooted, the
SSVM makes the following entry in the IP Route Table:
- 102.165.XXX.ZZZ via 192.168.50.1 dev eth1
This is correct according to Cloudstack because that's the default gateway
for the storage network. However, if I remove this entry from the IP route
table in the SSVM, the NFS mount is successful, because now it connects via the
default public route.
So here we can see why the mount fails:
```
root@s-145-VM:~# mount -t nfs 102.165.XXX.ZZZ:/data/secondary
/mnt/SecStorage/test
mount.nfs: access denied by server while mounting
102.165.XXX.ZZZ:/data/secondary
root@s-145-VM:~# mount -t nfs -vvv 102.165.XXX.ZZZ:/data/secondary
/mnt/SecStorage/test
mount.nfs: timeout set for Thu Feb 16 14:07:11 2023
mount.nfs: trying text-based options
'vers=4.2,addr=102.165.XXX.ZZZ,clientaddr=192.168.50.53'
mount.nfs: mount(2): Operation not permitted
mount.nfs: trying text-based options 'addr=102.165.XXX.ZZZ'
mount.nfs: prog 100003, trying vers=3, prot=6
mount.nfs: trying 102.165.XXX.ZZZ prog 100003 vers 3 prot TCP port 2049
mount.nfs: prog 100005, trying vers=3, prot=17
mount.nfs: trying 102.165.XXX.ZZZ prog 100005 vers 3 prot UDP port 892
mount.nfs: mount(2): Permission denied
mount.nfs: access denied by server while mounting
102.165.XXX.ZZZ:/data/secondary
```
As you can see, it's trying to mount from a private IP address
`192.168.50.53`, and because NFS is not in the same network, it will fail as it
is not permitted.
Now, here's the weird part, like I said this has worked for months. My
second NFS secondary storage is also on the same remote network:
102.165.XXX.YYY when I mount to this NFS storage from SSVM it mounts perfectly
fine without any issues:
```
root@s-145-VM:~# mount -t nfs -vvv 102.165.XXX.YYY:/data/secondary
/mnt/SecStorage/test
mount.nfs: timeout set for Thu Feb 16 14:07:58 2023
mount.nfs: trying text-based options
'vers=4.2,addr=102.165.XXX.YYY,clientaddr=197.189.XXX.YYY'
root@s-145-VM:~#
```
The reason is that SSVM does not end up creating a route in the routing
table such as 102.165.XXX.YYY via 192.168.50.1, no. It remains to use the
default route (public), which is why we can see the SSVM public IP
197.189.XXX.YYY.
This now leaves me with a few questions:
1. Can I not make use of remote/external NFS storage?
2. Why does SSVM not create a routing path for the second NFS and force it
to only create for the 102.165.XXX.ZZZ NFS?
3. Why has this been working for months if the answer to question 1 is no?
4. Why does SSVM mount the second remote NFS server perfectly fine and not
just do the same for my one NFS, which I really need.
~~~
<!-- You can also paste gist.github.com links for larger files -->
##### EXPECTED RESULTS
~~~
NFS should mount accordingly regardless of private or public NFS server
network as secondary storage.
~~~
##### My Goal
~~~
If using external NFS servers is not recommended, then I will happily
configure new NFS servers via a private network to work accordingly. However, I
am left with a loop problem. Because my NFS with my actual data cannot
automatically mount upon CS or SSVM restart, the verification checks on my
templates fail. They are 100% downloaded, but their ready state shows as "No".
When checking the DB, the `template_view` table, their state shows as
"Migrating". Now because NFS does not mount, checks cannot complete, and it no
prevent me from moving my data from this existing NFS storage to a new NFS
storage.
So if we cannot fix the bug above, is there a way I can set the state of
these templates that show as "Migrating" to "Ready" so that I can just move my
data accordingly? I have tried to update the table, but I am returned with an
SQL error that this table is not updatable.
And if this is not possible, is there any way I can move my data from this
NFS storage to a new NFS storage such as using RSYNC or so?
I am REALLY stuck with a looping problem and would appreciate all the
efforts possible.
~~~
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]