[ 
https://issues.apache.org/jira/browse/CASSANDRA-2118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13284501#comment-13284501
 ] 

Jonathan Ellis commented on CASSANDRA-2118:
-------------------------------------------

I don't think we need more than two options.  It's common for disks to become 
readable-not-writable, but I've never heard of them being 
writable-not-readable.  Assuming that we address CASSANDRA-2116 at the right 
level of granularity (the disk) there are two sane options:

# Continue as best we can in the face of errors: If we can't write to a disk, 
log an error, mark it bad-for-writes, and continue writing to other disks.  If 
we can't read from a disk, log an error, mark it bad-for-reads-and-writes, and 
continue serving reads from other disks
# Since option one implies that we can blithely serve up stale data when the 
most recent version was on the disk that is no longer accessible, I can see the 
utility of an option to halt on error (which would allow an operator to choose 
to decommission + rebootstrap to minimize the inconsistencies observed at 
CL.ONE)
                
> Provide failure modes if issues with the underlying filesystem of a node
> ------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2118
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2118
>             Project: Cassandra
>          Issue Type: Sub-task
>    Affects Versions: 0.8 beta 1
>            Reporter: Chris Goffinet
>            Assignee: Chris Goffinet
>         Attachments: 
> 0001-Provide-failure-modes-if-issues-with-the-underlying-.patch, 
> 0001-Provide-failure-modes-if-issues-with-the-underlying-v2.patch, 
> 0001-Provide-failure-modes-if-issues-with-the-underlying-v3.patch
>
>
> CASSANDRA-2116 introduces the ability to detect FS errors. Let's provide a 
> mode in cassandra.yaml so operators can decide that in the event of failure 
> what to do:
> 1) standard - means continue on all errors (default)
> 2) read - means only stop  gossip/rpc server if 'reads' fail from drive, 
> writes can fail but not kill gossip/rpc server
> 3) readwrite - means stop gossip/rpc server if any read or write errors.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to