[jira] [Commented] (CASSANDRA-15191) stop_paranoid disk failure policy is ignored on CorruptSSTableException after node is up

David Capwell (Jira) Thu, 23 Jul 2020 17:28:28 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-15191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17164051#comment-17164051
 ]


David Capwell commented on CASSANDRA-15191:
-------------------------------------------

CI 3.11 - 
https://app.circleci.com/pipelines/github/dcapwell/cassandra/309/workflows/b4cbed8d-868f-4640-a697-471fa03fd4bf
CI trunk - 
https://app.circleci.com/pipelines/github/dcapwell/cassandra/310/workflows/62969c9b-9c65-4558-9ec0-3fcc3f17d79e

Looks like this patch doesn't play nicely with commit log, this breaks the 
following tests

commitlog_test.py
 - test_ignore_failure_policy
 - test_stop_commit_failure_policy

Here is the log from the ignore policy test 
https://1573-209217594-gh.circle-artifacts.com/62/dtest_j8_without_vnodes_logs/1595547611103_test_ignore_failure_policy/node1.log

sample that stands out

{code}
ERROR [COMMIT-LOG-ALLOCATOR] 2020-07-23 23:40:08,735 CommitLog.java:499 - 
Failed managing commit log segments
org.apache.cassandra.io.FSWriteError: java.nio.file.AccessDeniedException: 
/tmp/dtest-zt17lw0m/test/node1/commitlogs/CommitLog-7-1595547598804.log
        at 
org.apache.cassandra.db.commitlog.CommitLogSegment.<init>(CommitLogSegment.java:180)
        at 
org.apache.cassandra.db.commitlog.MemoryMappedSegment.<init>(MemoryMappedSegment.java:45)
        at 
org.apache.cassandra.db.commitlog.CommitLogSegment.createSegment(CommitLogSegment.java:137)
        at 
org.apache.cassandra.db.commitlog.CommitLogSegmentManagerStandard.createSegment(CommitLogSegmentManagerStandard.java:66)
        at 
org.apache.cassandra.db.commitlog.AbstractCommitLogSegmentManager$1.runMayThrow(AbstractCommitLogSegmentManager.java:114)
        at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
        at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.nio.file.AccessDeniedException: 
/tmp/dtest-zt17lw0m/test/node1/commitlogs/CommitLog-7-1595547598804.log
        at 
sun.nio.fs.UnixException.translateToIOException(UnixException.java:84)
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
        at 
sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177)
        at java.nio.channels.FileChannel.open(FileChannel.java:287)
        at java.nio.channels.FileChannel.open(FileChannel.java:335)
        at 
org.apache.cassandra.db.commitlog.CommitLogSegment.<init>(CommitLogSegment.java:175)
        ... 7 common frames omitted
ERROR [COMMIT-LOG-ALLOCATOR] 2020-07-23 23:40:09,736 
DefaultFSErrorHandler.java:66 - Stopping transports as disk_failure_policy is 
stop
{code}

Looks like the commit policy isn't respected and instead we fall back to the 
normal disk policy.

[~stefan.miklosovic] can you look into this?

> stop_paranoid disk failure policy is ignored on CorruptSSTableException after 
> node is up
> ----------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-15191
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15191
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local/Config
>            Reporter: Vincent White
>            Assignee: Stefan Miklosovic
>            Priority: Normal
>             Fix For: 3.11.x, 4.0-beta
>
>         Attachments: log.txt
>
>          Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> There is a bug when disk_failure_policy is set to stop_paranoid and 
> CorruptSSTableException is thrown after server is up. The problem is that 
> this setting is ignored. Normally, it should stop gossip and transport but it 
> just continues to serve requests and an exception is just logged.
>  
> This patch unifies the exception handling in JVMStabilityInspector and code 
> is reworked in such way that this inspector acts as a central place where 
> such exceptions are inspected. 
>  
> The core reason for ignoring that exception is that thrown exception in 
> AbstractLocalAwareExecturorService is not CorruptSSTableException but it is 
> RuntimeException and that exception is as its cause. Hence it is better if we 
> handle this in JVMStabilityInspector which can recursively examine it, hence 
> act accordingly.
> Behaviour before:
> stop_paranoid of disk_failure_policy is ignored when CorruptSSTableException 
> is thrown, e.g. on a regular select statement
> Behaviour after:
> Gossip and transport (cql) is turned off, JVM is still up for further 
> investigation e.g. by jmx.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (CASSANDRA-15191) stop_paranoid disk failure policy is ignored on CorruptSSTableException after node is up

Reply via email to