[
https://issues.apache.org/jira/browse/CASSANDRA-809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jonathan Ellis resolved CASSANDRA-809.
--------------------------------------
Resolution: Duplicate
Fix Version/s: (was: 1.2)
Created https://issues.apache.org/jira/browse/CASSANDRA-4292 to pursue the
thread-per-disk idea. CASSANDRA-2116 and CASSANDRA-2118 address the issue of
what to do when disks error out.
> Full disk can result in being marked down
> -----------------------------------------
>
> Key: CASSANDRA-809
> URL: https://issues.apache.org/jira/browse/CASSANDRA-809
> Project: Cassandra
> Issue Type: Bug
> Reporter: Ryan King
> Priority: Minor
>
> We had a node file up the disk under one of two data directories. The result
> was that the node stopped making progress. The problem appears to be this
> (I'll update with more details as we find them):
> When new tasks are put onto most queues in Cassandra, if there isn't a thread
> in the pool to handle the task immediately, the task in run in the caller's
> thread
> (org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor:69 sets the
> caller-runs policy). The queue in question here is the queue that manages
> flushes, which is enqueued to from various places in our code (and therefore
> likely from multiple threads). Assuming that the full disk meant that no
> threads doing flushing could make progress (it appears that way) eventually
> any thread that calls the flush code would become stalled.
> Assuming our analysis is right (and we're still looking into it) we need to
> make a change. Here's a proposal so far:
> SHORT TERM:
> * change the TheadPoolExecutor policy to not be caller runs. This will let
> other threads make progress in the event that one pool is stalled
> LONG TERM
> * It appears that there are n threads for n data directories that we flush
> to, but they're not dedicated to a data directory. We should have a thread
> per data directory and have that thread dedicated to that directory
> * Perhaps we could use the failure detector on disks?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira