bwsw opened a new issue #3838: SSVM bug when scanning templates URL: https://github.com/apache/cloudstack/issues/3838 <!-- Verify first that your issue/request is not already reported on GitHub. Also test if the latest release and master branch are affected too. Always add information AFTER of these HTML comments, but no need to delete the comments. --> ##### ISSUE TYPE <!-- Pick one below and delete the rest --> * Bug Report ##### COMPONENT NAME <!-- Categorize the issue, e.g. API, VR, VPN, UI, etc. --> ~~~ SSVM ~~~ ##### CLOUDSTACK VERSION <!-- New line separated list of affected versions, commit ID for issues on master branch. --> ~~~ 4.11.2 ~~~ ##### CONFIGURATION <!-- Information about the configuration if relevant, e.g. basic network, advanced networking, etc. N/A otherwise --> ##### OS / ENVIRONMENT <!-- Information about the environment if relevant, N/A otherwise --> N/A ##### SUMMARY <!-- Explain the problem/feature briefly --> Several months ago I found that SSVM occasionally loses all the templates (we have a lot of them) causing massive redownloading which takes a lot of time. Today I started the investigation and found following logs on SSVM (DEBUG enabled): ``` 2020-01-26 02:58:39,710 DEBUG [cloud.agent.Agent] (agentRequest-Handler-1:null) Processing command: com.cloud.agent.api.storage.ListTemplateCommand 2020-01-26 02:58:39,710 INFO [storage.resource.NfsSecondaryStorageResource] (agentRequest-Handler-1:null) Determined host 10.252.2.4 corresponds to IP 10.252.2.4 2020-01-26 02:58:39,711 DEBUG [storage.resource.NfsSecondaryStorageResource] (agentRequest-Handler-1:null) Mounting device with nfs-style path of 10.252.2.4:/secondary 2020-01-26 02:58:39,711 DEBUG [storage.resource.NfsSecondaryStorageResource] (agentRequest-Handler-1:null) mount nfs://10.252.2.4/secondary on /mnt/SecStorage/c082d67e-cc0c-3655-adbf-cfb545c7beec 2020-01-26 02:58:39,711 DEBUG [storage.resource.NfsSecondaryStorageResource] (agentRequest-Handler-1:null) making available /mnt/SecStorage/c082d67e-cc0c-3655-adbf-cfb545c7beec on nfs://10.252.2.4/seconda ry 2020-01-26 02:58:39,711 DEBUG [storage.resource.NfsSecondaryStorageResource] (agentRequest-Handler-1:null) local folder for mount will be /mnt/SecStorage/c082d67e-cc0c-3655-adbf-cfb545c7beec 2020-01-26 02:58:39,712 DEBUG [storage.resource.NfsSecondaryStorageResource] (agentRequest-Handler-1:null) Executing: mount 2020-01-26 02:58:39,713 DEBUG [storage.resource.NfsSecondaryStorageResource] (agentRequest-Handler-1:null) Executing while with timeout : 1440000 2020-01-26 02:58:39,714 DEBUG [storage.resource.NfsSecondaryStorageResource] (agentRequest-Handler-1:null) Execution is successful. 2020-01-26 02:58:39,715 DEBUG [storage.resource.NfsSecondaryStorageResource] (agentRequest-Handler-1:null) Some device already mounted at /mnt/SecStorage/c082d67e-cc0c-3655-adbf-cfb545c7beec, no need to m ount nfs://10.252.2.4/secondary 2020-01-26 02:58:39,718 DEBUG [storage.template.DownloadManagerImpl] (agentRequest-Handler-1:null) Executing: /usr/local/cloud/systemvm/scripts/storage/secondary/listvmtmplt.sh -r /mnt/SecStorage/c082d67e -cc0c-3655-adbf-cfb545c7beec/template/tmpl/ 2020-01-26 02:58:39,720 DEBUG [storage.template.DownloadManagerImpl] (agentRequest-Handler-1:null) Executing while with timeout : 3600000 2020-01-26 02:58:40,170 DEBUG [storage.template.DownloadManagerImpl] (agentRequest-Handler-1:null) Exit value is 143 2020-01-26 02:58:40,175 WARN [storage.template.DownloadManagerImpl] (agentRequest-Handler-1:null) Exception: /usr/local/cloud/systemvm/scripts/storage/secondary/listvmtmplt.sh -r /mnt/SecStorage/c082d67e -cc0c-3655-adbf-cfb545c7beec/template/tmpl/ java.io.IOException: Stream closed at java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:170) at java.io.BufferedInputStream.read(BufferedInputStream.java:336) at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) at java.io.InputStreamReader.read(InputStreamReader.java:184) at java.io.BufferedReader.fill(BufferedReader.java:161) at java.io.BufferedReader.readLine(BufferedReader.java:324) at java.io.BufferedReader.readLine(BufferedReader.java:389) at com.cloud.utils.script.OutputInterpreter.processError(OutputInterpreter.java:37) at com.cloud.utils.script.Script.execute(Script.java:289) at org.apache.cloudstack.storage.template.DownloadManagerImpl.listTemplates(DownloadManagerImpl.java:809) at org.apache.cloudstack.storage.template.DownloadManagerImpl.gatherTemplateInfo(DownloadManagerImpl.java:824) at org.apache.cloudstack.storage.resource.NfsSecondaryStorageResource.execute(NfsSecondaryStorageResource.java:2052) at org.apache.cloudstack.storage.resource.NfsSecondaryStorageResource.executeRequest(NfsSecondaryStorageResource.java:306) at com.cloud.agent.Agent.processRequest(Agent.java:645) at com.cloud.agent.Agent$AgentRequestHandler.doTask(Agent.java:1083) at com.cloud.utils.nio.Task.call(Task.java:83) at com.cloud.utils.nio.Task.call(Task.java:29) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2020-01-26 02:58:40,182 INFO [cloud.agent.Agent] (AgentShutdownThread:null) Stopping the agent: Reason = sig.kill 2020-01-26 02:58:40,187 INFO [storage.template.DownloadManagerImpl] (agentRequest-Handler-1:null) found 0 templates[] 2020-01-26 02:58:40,188 DEBUG [cloud.agent.Agent] (AgentShutdownThread:null) Sending shutdown to management server 2020-01-26 02:58:40,188 DEBUG [cloud.agent.Agent] (agentRequest-Handler-1:null) Seq 190-6215811910701416453: { Ans: , MgmtId: 179116088862604, via: 190, Ver: v1, Flags: 110, [{"com.cloud.agent.api.storage.ListTemplateAnswer":{"secUrl":"nfs://10.252.2.4/secondary","templateInfos":{},"result":true,"details":"success","wait":0}}] } 2020-01-26 02:58:40,240 DEBUG [utils.nio.NioConnection] (Agent-NioConnectionHandler-1:null) Location 1: Socket Socket[addr=/10.252.2.1,port=8250,localport=39288] closed on read. Probably -1 returned: Connection closed with -1 on reading size. ``` I suppose the reason SSVM starts redownload templates is that `/usr/local/cloud/systemvm/scripts/storage/secondary/listvmtmplt.sh` is killed by someone. I try to execute `/usr/local/cloud/systemvm/scripts/storage/secondary/listvmtmplt.sh -r /mnt/SecStorage/c082d67e-cc0c-3655-adbf-cfb545c7beec/template/tmpl/` manualy, it works well. Timeout of `3600000` is also fine, templates are listed in 30 seconds. Tried to find OOM kill, no such information in dmesg: ``` ``` ##### STEPS TO REPRODUCE <!-- For bugs, show exactly how to reproduce the problem, using a minimal test-case. Use Screenshots if accurate. For new features, show how the feature would be used. --> <!-- Paste example playbooks or commands between quotes below --> ~~~ Don't know how to reproduce in arbitrary env. ~~~ <!-- You can also paste gist.github.com links for larger files --> ##### EXPECTED RESULTS <!-- What did you expect to happen when running the steps above? --> ~~~ SSVM doesn't download 'good' templates more than one time. ~~~ ##### ACTUAL RESULTS <!-- What actually happened? --> <!-- Paste verbatim command output between quotes below --> ~~~ SSVM redownloads whole SSVM templates occasionally. ~~~
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
