[
https://issues.apache.org/jira/browse/CLOUDSTACK-105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13664327#comment-13664327
]
Caleb Call commented on CLOUDSTACK-105:
---------------------------------------
I'll be happy to attach the dump but this isn't something that just happens.
It's constantly happening. In order to avoid our servers from crashing, we
have to have a cronjob that removes any of these files that are older than a
couple days old. I also don't think this is necessarily a Xenserver bug, maybe
a Xenserver under CloudStack as without joining Xenserver to CloudStack, this
never happens. Once it's joined, it starts happening. I'm also have a
suspicion it's being caused by this script /etc/xapi.d/plugins/vmops and in
particular, this part of that script (sorry, I'm sure jira is going to munge
this output):
def setLinkLocalIP(session, args):
brName = args['brName']
try:
cmd = ["ip", "route", "del", "169.254.0.0/16"]
txt = util.pread2(cmd)
except:
txt = ''
try:
cmd = ["ifconfig", brName, "169.254.0.1", "netmask", "255.255.0.0"]
txt = util.pread2(cmd)
except:
try:
cmd = ['cat', '/etc/xensource/network.conf']
result = util.pread2(cmd)
except:
return 'can not cat network.conf'
if result.lower() == "bridge":
try:
cmd = ["brctl", "addbr", brName]
txt = util.pread2(cmd)
except:
pass
else:
try:
cmd = ["ovs-vsctl", "add-br", brName]
txt = util.pread2(cmd)
except:
pass
try:
cmd = ["ifconfig", brName, "169.254.0.1", "netmask", "255.255.0.0"]
txt = util.pread2(cmd)
except:
pass
try:
cmd = ["ip", "route", "add", "169.254.0.0/16", "dev", brName, "src",
"169.254.0.1"]
txt = util.pread2(cmd)
except:
txt = ''
txt = 'success'
return txt
> /tmp/stream-unix.####.###### stale sockets causing inodes to run out on
> Xenserver
> ---------------------------------------------------------------------------------
>
> Key: CLOUDSTACK-105
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-105
> Project: CloudStack
> Issue Type: Bug
> Security Level: Public(Anyone can view this level - this is the
> default.)
> Components: Third-Party Bugs
> Affects Versions: pre-4.0.0
> Environment: Xenserver 6.0.2
> Cloudstack 3.0.2
> Reporter: Caleb Call
> Assignee: Devdeep Singh
> Fix For: 4.1.0
>
> Attachments: messages
>
>
> We came across an interesting issue in one of our clusters. We ran out of
> inodes on all of our cluster members (since when does this happen in 2012?).
> When this happened, it in turn made the / filesystem a read-only filesystem
> which in turn made all the hosts go in to emergency maintenance mode and as a
> result get marked down by Cloudstack. We found that it was caused by
> hundreds of thousands of stale socket files in /tmp named
> "stream-unix.####.######". To resolve the issue, we had to delete those
> stale socket files (find /tmp -name "*stream*" -mtime +7 -exec rm -v {} \;),
> then kill and restart xapi, then correct the emergency maintenance mode.
> These hosts had only been up for 45 days before this issue occurred.
> In our scouring of the interwebs, the only other instance we've been able to
> find of this (or similar) happening is in the same setup we are currently
> running. Xenserver 6.0.2 with CS 3.0.2. Do these stream-unix sockets have
> anything to do with Cloudstack? I would think if this was a Xenserver issue
> (bug), there would be a lot more on the internet about this happening. For a
> temporary workaround, we've added a cronjob to cleanup these files but we'd
> really like to address the actual issue that's causing these sockets to
> become stale and not get cleaned-up.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira