bwsw opened a new issue #3838: SSVM bug when scanning templates
URL: https://github.com/apache/cloudstack/issues/3838
 
 
   <!--
   Verify first that your issue/request is not already reported on GitHub.
   Also test if the latest release and master branch are affected too.
   Always add information AFTER of these HTML comments, but no need to delete 
the comments.
   -->
   
   ##### ISSUE TYPE
   <!-- Pick one below and delete the rest -->
    * Bug Report
   
   ##### COMPONENT NAME
   <!--
   Categorize the issue, e.g. API, VR, VPN, UI, etc.
   -->
   ~~~
   SSVM
   ~~~
   
   ##### CLOUDSTACK VERSION
   <!--
   New line separated list of affected versions, commit ID for issues on master 
branch.
   -->
   
   ~~~
   4.11.2
   ~~~
   
   ##### CONFIGURATION
   <!--
   Information about the configuration if relevant, e.g. basic network, 
advanced networking, etc.  N/A otherwise
   -->
   
   
   ##### OS / ENVIRONMENT
   <!--
   Information about the environment if relevant, N/A otherwise
   -->
   N/A
   
   ##### SUMMARY
   <!-- Explain the problem/feature briefly -->
   
   Several months ago I found that SSVM occasionally loses all the templates 
(we have a lot of them) causing massive redownloading which takes a lot of 
time. Today I started the investigation and found following logs on SSVM (DEBUG 
enabled):
   
   ```
   2020-01-26 02:58:39,710 DEBUG [cloud.agent.Agent] 
(agentRequest-Handler-1:null) Processing command: 
com.cloud.agent.api.storage.ListTemplateCommand
   2020-01-26 02:58:39,710 INFO  [storage.resource.NfsSecondaryStorageResource] 
(agentRequest-Handler-1:null) Determined host 10.252.2.4 corresponds to IP 
10.252.2.4
   2020-01-26 02:58:39,711 DEBUG [storage.resource.NfsSecondaryStorageResource] 
(agentRequest-Handler-1:null) Mounting device with nfs-style path of 
10.252.2.4:/secondary
   2020-01-26 02:58:39,711 DEBUG [storage.resource.NfsSecondaryStorageResource] 
(agentRequest-Handler-1:null) mount nfs://10.252.2.4/secondary on 
/mnt/SecStorage/c082d67e-cc0c-3655-adbf-cfb545c7beec
   2020-01-26 02:58:39,711 DEBUG [storage.resource.NfsSecondaryStorageResource] 
(agentRequest-Handler-1:null) making available 
/mnt/SecStorage/c082d67e-cc0c-3655-adbf-cfb545c7beec on nfs://10.252.2.4/seconda
   ry
   2020-01-26 02:58:39,711 DEBUG [storage.resource.NfsSecondaryStorageResource] 
(agentRequest-Handler-1:null) local folder for mount will be 
/mnt/SecStorage/c082d67e-cc0c-3655-adbf-cfb545c7beec
   2020-01-26 02:58:39,712 DEBUG [storage.resource.NfsSecondaryStorageResource] 
(agentRequest-Handler-1:null) Executing: mount 
   2020-01-26 02:58:39,713 DEBUG [storage.resource.NfsSecondaryStorageResource] 
(agentRequest-Handler-1:null) Executing while with timeout : 1440000
   2020-01-26 02:58:39,714 DEBUG [storage.resource.NfsSecondaryStorageResource] 
(agentRequest-Handler-1:null) Execution is successful.
   2020-01-26 02:58:39,715 DEBUG [storage.resource.NfsSecondaryStorageResource] 
(agentRequest-Handler-1:null) Some device already mounted at 
/mnt/SecStorage/c082d67e-cc0c-3655-adbf-cfb545c7beec, no need to m
   ount nfs://10.252.2.4/secondary
   2020-01-26 02:58:39,718 DEBUG [storage.template.DownloadManagerImpl] 
(agentRequest-Handler-1:null) Executing: 
/usr/local/cloud/systemvm/scripts/storage/secondary/listvmtmplt.sh -r 
/mnt/SecStorage/c082d67e
   -cc0c-3655-adbf-cfb545c7beec/template/tmpl/ 
   2020-01-26 02:58:39,720 DEBUG [storage.template.DownloadManagerImpl] 
(agentRequest-Handler-1:null) Executing while with timeout : 3600000
   2020-01-26 02:58:40,170 DEBUG [storage.template.DownloadManagerImpl] 
(agentRequest-Handler-1:null) Exit value is 143
   2020-01-26 02:58:40,175 WARN  [storage.template.DownloadManagerImpl] 
(agentRequest-Handler-1:null) Exception: 
/usr/local/cloud/systemvm/scripts/storage/secondary/listvmtmplt.sh -r 
/mnt/SecStorage/c082d67e
   -cc0c-3655-adbf-cfb545c7beec/template/tmpl/ 
   java.io.IOException: Stream closed
           at 
java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:170)
           at java.io.BufferedInputStream.read(BufferedInputStream.java:336)
           at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
           at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
           at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
           at java.io.InputStreamReader.read(InputStreamReader.java:184)
           at java.io.BufferedReader.fill(BufferedReader.java:161)
           at java.io.BufferedReader.readLine(BufferedReader.java:324)
           at java.io.BufferedReader.readLine(BufferedReader.java:389)
           at 
com.cloud.utils.script.OutputInterpreter.processError(OutputInterpreter.java:37)
           at com.cloud.utils.script.Script.execute(Script.java:289)
           at 
org.apache.cloudstack.storage.template.DownloadManagerImpl.listTemplates(DownloadManagerImpl.java:809)
           at 
org.apache.cloudstack.storage.template.DownloadManagerImpl.gatherTemplateInfo(DownloadManagerImpl.java:824)
           at 
org.apache.cloudstack.storage.resource.NfsSecondaryStorageResource.execute(NfsSecondaryStorageResource.java:2052)
           at 
org.apache.cloudstack.storage.resource.NfsSecondaryStorageResource.executeRequest(NfsSecondaryStorageResource.java:306)
           at com.cloud.agent.Agent.processRequest(Agent.java:645)
           at com.cloud.agent.Agent$AgentRequestHandler.doTask(Agent.java:1083)
           at com.cloud.utils.nio.Task.call(Task.java:83)
           at com.cloud.utils.nio.Task.call(Task.java:29)
           at java.util.concurrent.FutureTask.run(FutureTask.java:266)
           at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           at java.lang.Thread.run(Thread.java:748)
   2020-01-26 02:58:40,182 INFO  [cloud.agent.Agent] (AgentShutdownThread:null) 
Stopping the agent: Reason = sig.kill
   2020-01-26 02:58:40,187 INFO  [storage.template.DownloadManagerImpl] 
(agentRequest-Handler-1:null) found 0 templates[]
   2020-01-26 02:58:40,188 DEBUG [cloud.agent.Agent] (AgentShutdownThread:null) 
Sending shutdown to management server
   2020-01-26 02:58:40,188 DEBUG [cloud.agent.Agent] 
(agentRequest-Handler-1:null) Seq 190-6215811910701416453:  { Ans: , MgmtId: 
179116088862604, via: 190, Ver: v1, Flags: 110, 
[{"com.cloud.agent.api.storage.ListTemplateAnswer":{"secUrl":"nfs://10.252.2.4/secondary","templateInfos":{},"result":true,"details":"success","wait":0}}]
 }
   2020-01-26 02:58:40,240 DEBUG [utils.nio.NioConnection] 
(Agent-NioConnectionHandler-1:null) Location 1: Socket 
Socket[addr=/10.252.2.1,port=8250,localport=39288] closed on read.  Probably -1 
returned: Connection closed with -1 on reading size.
   ```
   
   I suppose the reason SSVM starts redownload templates is that 
`/usr/local/cloud/systemvm/scripts/storage/secondary/listvmtmplt.sh` is killed 
by someone. I try to execute 
`/usr/local/cloud/systemvm/scripts/storage/secondary/listvmtmplt.sh -r 
/mnt/SecStorage/c082d67e-cc0c-3655-adbf-cfb545c7beec/template/tmpl/` manualy, 
it works well. Timeout of `3600000` is also fine, templates are listed in 30 
seconds.
   
   Tried to find OOM kill, no such information in dmesg:
   
   ```
   
   ```
   
   
   ##### STEPS TO REPRODUCE
   <!--
   For bugs, show exactly how to reproduce the problem, using a minimal 
test-case. Use Screenshots if accurate.
   
   For new features, show how the feature would be used.
   -->
   
   <!-- Paste example playbooks or commands between quotes below -->
   ~~~
   Don't know how to reproduce in arbitrary env.
   ~~~
   
   <!-- You can also paste gist.github.com links for larger files -->
   
   ##### EXPECTED RESULTS
   <!-- What did you expect to happen when running the steps above? -->
   
   ~~~
   SSVM doesn't download 'good' templates more than one time.
   ~~~
   
   ##### ACTUAL RESULTS
   <!-- What actually happened? -->
   
   <!-- Paste verbatim command output between quotes below -->
   ~~~
   SSVM redownloads whole SSVM templates occasionally.
   ~~~
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to