whitetiger264 opened a new issue, #7244:
URL: https://github.com/apache/cloudstack/issues/7244

   ##### ISSUE TYPE
    * Bug Report
   
   ##### COMPONENT NAME
   
   ~~~
   NFS & SSVM
   ~~~
   
   ##### CLOUDSTACK VERSION
   
   ~~~
   4.17.2
   ~~~
   
   ##### CONFIGURATION
   
   ~~~
   KVM Host, using the advanced network.
   ~~~
   
   ##### OS / ENVIRONMENT
   
   ~~~
   KVM Host & CS Host Both AlmaLinux 8
   ~~~
   
   
   ##### SUMMARY
   NFS Secondary Storage Fails to mount in SSVM upon SSVM or CS restart.
   
   
   ##### STEPS TO REPRODUCE
   Most like will not be able to reproduce it may only be in my environment 
this is caused.
   
   
   ~~~
   I have two NFS Secondary storage, and they run on a completely external 
network entirely remotely from my CloudStack network. These are their public IP 
addresses:
   
   1. NFS One: 102.165.XXX.YYY
   2. NFS Two: 102.165.XXX.ZZZ
   
   I have a private network for my cloud stack environment, which is 
192.168.50.0/24 and both my management and storage fall within this network 
using the same 192.168.50.1 gateway. But keep in mind, my actual NFS storage is 
remote via internet public IP.
   
   For months I have been connecting to my NFS secondary storage via public IP 
with zero issues until one morning it just failed. The failed mount error in 
the logs shows:
   
   ```
   2023-02-16 10:24:22,423 ERROR [storage.resource.NfsSecondaryStorageResource] 
(agentRequest-Handler-2:null) GetRootDir for nfs://nfsip/data/secondary failed 
due to com.cloud.utils.exception.CloudRuntimeException: Unable to create local 
folder for: /mnt/SecStorage/91de6d1c-4c04-359c-82b5-fdcfe4a83da7 in order to 
mount nfs://102.165.XXX.ZZZ/data/secondary
   com.cloud.utils.exception.CloudRuntimeException: Unable to create local 
folder for: /mnt/SecStorage/91de6d1c-4c04-359c-82b5-fdcfe4a83da7 in order to 
mount nfs://102.165.XXX.ZZZ/data/secondary
   ```
   
   Upon further investigation, I found that when SSVM or CS is rebooted, the 
SSVM makes the following entry in the IP Route Table:
   
   - 102.165.XXX.ZZZ via 192.168.50.1 dev eth1
   
   This is correct according to Cloudstack because that's the default gateway 
for the storage network. However, if I remove this entry from the IP route 
table in the SSVM, the NFS mount is successful, because now it connects via the 
default public route.
   
   So here we can see why the mount fails:
   
   ```
   root@s-145-VM:~# mount -t nfs 102.165.XXX.ZZZ:/data/secondary 
/mnt/SecStorage/test
   mount.nfs: access denied by server while mounting 
102.165.XXX.ZZZ:/data/secondary
   root@s-145-VM:~# mount -t nfs -vvv 102.165.XXX.ZZZ:/data/secondary 
/mnt/SecStorage/test
   mount.nfs: timeout set for Thu Feb 16 14:07:11 2023
   mount.nfs: trying text-based options 
'vers=4.2,addr=102.165.XXX.ZZZ,clientaddr=192.168.50.53'
   mount.nfs: mount(2): Operation not permitted
   mount.nfs: trying text-based options 'addr=102.165.XXX.ZZZ'
   mount.nfs: prog 100003, trying vers=3, prot=6
   mount.nfs: trying 102.165.XXX.ZZZ prog 100003 vers 3 prot TCP port 2049
   mount.nfs: prog 100005, trying vers=3, prot=17
   mount.nfs: trying 102.165.XXX.ZZZ prog 100005 vers 3 prot UDP port 892
   mount.nfs: mount(2): Permission denied
   mount.nfs: access denied by server while mounting 
102.165.XXX.ZZZ:/data/secondary
   ```
   
   As you can see, it's trying to mount from a private IP address 
`192.168.50.53`, and because NFS is not in the same network, it will fail as it 
is not permitted. 
   
   Now, here's the weird part, like I said this has worked for months. My 
second NFS secondary storage is also on the same remote network: 
102.165.XXX.YYY when I mount to this NFS storage from SSVM it mounts perfectly 
fine without any issues:
   
   ```
   root@s-145-VM:~# mount -t nfs -vvv 102.165.XXX.YYY:/data/secondary 
/mnt/SecStorage/test
   mount.nfs: timeout set for Thu Feb 16 14:07:58 2023
   mount.nfs: trying text-based options 
'vers=4.2,addr=102.165.XXX.YYY,clientaddr=197.189.XXX.YYY'
   root@s-145-VM:~#
   ```
   
   The reason is that SSVM does not end up creating a route in the routing 
table such as 102.165.XXX.YYY via 192.168.50.1, no. It remains to use the 
default route (public), which is why we can see the SSVM public IP 
197.189.XXX.YYY.
   
   This now leaves me with a few questions:
   
   1. Can I not make use of remote/external NFS storage?
   2. Why does SSVM not create a routing path for the second NFS and force it 
to only create for the 102.165.XXX.ZZZ NFS?
   3. Why has this been working for months if the answer to question 1 is no?
   4. Why does SSVM mount the second remote NFS server perfectly fine and not 
just do the same for my one NFS, which I really need.
   
   ~~~
   
   <!-- You can also paste gist.github.com links for larger files -->
   
   ##### EXPECTED RESULTS
   
   ~~~
   NFS should mount accordingly regardless of private or public NFS server 
network as secondary storage. 
   ~~~
   
   ##### My Goal
   
   ~~~
   If using external NFS servers is not recommended, then I will happily 
configure new NFS servers via a private network to work accordingly. However, I 
am left with a loop problem. Because my NFS with my actual data cannot 
automatically mount upon CS or SSVM restart, the verification checks on my 
templates fail. They are 100% downloaded, but their ready state shows as "No". 
When checking the DB, the `template_view` table, their state shows as 
"Migrating". Now because NFS does not mount, checks cannot complete, and it no 
prevent me from moving my data from this existing NFS storage to a new NFS 
storage. 
   
   So if we cannot fix the bug above, is there a way I can set the state of 
these templates that show as "Migrating" to "Ready" so that I can just move my 
data accordingly? I have tried to update the table, but I am returned with an 
SQL error that this table is not updatable. 
   
   And if this is not possible, is there any way I can move my data from this 
NFS storage to a new NFS storage such as using RSYNC or so?
   
   I am REALLY stuck with a looping problem and would appreciate all the 
efforts possible. 
   ~~~
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to