[
https://issues.apache.org/jira/browse/CASSANDRA-15191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17164051#comment-17164051
]
David Capwell commented on CASSANDRA-15191:
-------------------------------------------
CI 3.11 -
https://app.circleci.com/pipelines/github/dcapwell/cassandra/309/workflows/b4cbed8d-868f-4640-a697-471fa03fd4bf
CI trunk -
https://app.circleci.com/pipelines/github/dcapwell/cassandra/310/workflows/62969c9b-9c65-4558-9ec0-3fcc3f17d79e
Looks like this patch doesn't play nicely with commit log, this breaks the
following tests
commitlog_test.py
- test_ignore_failure_policy
- test_stop_commit_failure_policy
Here is the log from the ignore policy test
https://1573-209217594-gh.circle-artifacts.com/62/dtest_j8_without_vnodes_logs/1595547611103_test_ignore_failure_policy/node1.log
sample that stands out
{code}
ERROR [COMMIT-LOG-ALLOCATOR] 2020-07-23 23:40:08,735 CommitLog.java:499 -
Failed managing commit log segments
org.apache.cassandra.io.FSWriteError: java.nio.file.AccessDeniedException:
/tmp/dtest-zt17lw0m/test/node1/commitlogs/CommitLog-7-1595547598804.log
at
org.apache.cassandra.db.commitlog.CommitLogSegment.<init>(CommitLogSegment.java:180)
at
org.apache.cassandra.db.commitlog.MemoryMappedSegment.<init>(MemoryMappedSegment.java:45)
at
org.apache.cassandra.db.commitlog.CommitLogSegment.createSegment(CommitLogSegment.java:137)
at
org.apache.cassandra.db.commitlog.CommitLogSegmentManagerStandard.createSegment(CommitLogSegmentManagerStandard.java:66)
at
org.apache.cassandra.db.commitlog.AbstractCommitLogSegmentManager$1.runMayThrow(AbstractCommitLogSegmentManager.java:114)
at
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.nio.file.AccessDeniedException:
/tmp/dtest-zt17lw0m/test/node1/commitlogs/CommitLog-7-1595547598804.log
at
sun.nio.fs.UnixException.translateToIOException(UnixException.java:84)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at
sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177)
at java.nio.channels.FileChannel.open(FileChannel.java:287)
at java.nio.channels.FileChannel.open(FileChannel.java:335)
at
org.apache.cassandra.db.commitlog.CommitLogSegment.<init>(CommitLogSegment.java:175)
... 7 common frames omitted
ERROR [COMMIT-LOG-ALLOCATOR] 2020-07-23 23:40:09,736
DefaultFSErrorHandler.java:66 - Stopping transports as disk_failure_policy is
stop
{code}
Looks like the commit policy isn't respected and instead we fall back to the
normal disk policy.
[~stefan.miklosovic] can you look into this?
> stop_paranoid disk failure policy is ignored on CorruptSSTableException after
> node is up
> ----------------------------------------------------------------------------------------
>
> Key: CASSANDRA-15191
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15191
> Project: Cassandra
> Issue Type: Bug
> Components: Local/Config
> Reporter: Vincent White
> Assignee: Stefan Miklosovic
> Priority: Normal
> Fix For: 3.11.x, 4.0-beta
>
> Attachments: log.txt
>
> Time Spent: 3.5h
> Remaining Estimate: 0h
>
> There is a bug when disk_failure_policy is set to stop_paranoid and
> CorruptSSTableException is thrown after server is up. The problem is that
> this setting is ignored. Normally, it should stop gossip and transport but it
> just continues to serve requests and an exception is just logged.
>
> This patch unifies the exception handling in JVMStabilityInspector and code
> is reworked in such way that this inspector acts as a central place where
> such exceptions are inspected.
>
> The core reason for ignoring that exception is that thrown exception in
> AbstractLocalAwareExecturorService is not CorruptSSTableException but it is
> RuntimeException and that exception is as its cause. Hence it is better if we
> handle this in JVMStabilityInspector which can recursively examine it, hence
> act accordingly.
> Behaviour before:
> stop_paranoid of disk_failure_policy is ignored when CorruptSSTableException
> is thrown, e.g. on a regular select statement
> Behaviour after:
> Gossip and transport (cql) is turned off, JVM is still up for further
> investigation e.g. by jmx.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]