repair leaving FDs unclosed
---------------------------
Key: CASSANDRA-1752
URL: https://issues.apache.org/jira/browse/CASSANDRA-1752
Project: Cassandra
Issue Type: Bug
Components: Core
Reporter: Jonathan Ellis
Fix For: 0.6.9
"We noticed that after a `nodetool repair` was ran, several of our nodes
reported high disk usage; -- even one node hit 100% disk usage. After a restart
of that node, disk usage drop instantly by 80 gigabytes -- well that was
confusing, but we quickly formed the theory that Cassandra must of been holding
open references to deleted file descriptors.
"Later, i found this node as an example, it is using about 8-10 gigabytes more
than it should be -- 118 gigabytes reported by df, yet du reports only 106
gigabytes in the cassandra directory (nothing else on the mahcine). As you can
see from the lsof listing, it is holding open FDs to files that no longer exist
on the filesystem, and there are no open streams or as far as I can tell other
reasons for the deleted sstable to be open.
"This seems to be related to running a repair, as we haven't seen it in any
other situations before."
A quick check of FileStreamTask shows that the obvious base is covered:
{code}
finally
{
try
{
raf.close();
}
catch (IOException e)
{
throw new AssertionError(e);
}
}
{code}
So it seems that either the transfer loop is never finishing to get to that
finally block (in which case why isn't it showing up in outbound streams?) or
something else is the problem.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.