[jira] [Commented] (CASSANDRA-17136) FQL: Enabling via nodetool can trigger disk_failure_mode

Brandon Williams (Jira) Wed, 17 Nov 2021 09:59:07 -0800


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-17136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17445415#comment-17445415
 ]


Brandon Williams commented on CASSANDRA-17136:
----------------------------------------------

bq. So the thing that got me to uncover this was that fqltool dump command can 
very conveniently create a directory layout just like the one above.

Aha, I see.

bq. looks like we only cleanDirectory when enabling FQL via JMX, so we can 
probably just catch the exception and let the user know.

cleanDirectory is what calls the JVMStabilityInspector, so instead I disabled 
it when called via JMX.

||Branch||CI||
|[4.0|https://github.com/driftx/cassandra/tree/CASSANDRA-17136]|[circle|https://app.circleci.com/pipelines/github/driftx/cassandra?branch=CASSANDRA-17136],
 
[!https://ci-cassandra.apache.org/job/Cassandra-devbranch/1285/badge/icon!|https://ci-cassandra.apache.org/blue/organizations/jenkins/Cassandra-devbranch/detail/Cassandra-devbranch/1285/pipeline]|
|[trunk|https://github.com/driftx/cassandra/tree/CASSANDRA-17136]|[circle|https://app.circleci.com/pipelines/github/driftx/cassandra?branch=CASSANDRA-17136-trunk],
 
[!https://ci-cassandra.apache.org/job/Cassandra-devbranch/1285/badge/icon!|https://ci-cassandra.apache.org/blue/organizations/jenkins/Cassandra-devbranch/detail/Cassandra-devbranch/1286/pipeline]|


> FQL: Enabling via nodetool can trigger disk_failure_mode
> --------------------------------------------------------
>
>                 Key: CASSANDRA-17136
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-17136
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Tool/fql
>            Reporter: Brendan Cicchi
>            Assignee: Brandon Williams
>            Priority: Normal
>             Fix For: 4.0.x
>
>
> When enabling fullquerylog via nodetool, if there is a non empty directory 
> present under the location specified via --path which would trigger an 
> java.nio.file.AccessDeniedException during cleaning, the node will trigger 
> the disk_failure_policy which by default is stop. This is a fairly easy way 
> to offline a cluster if someone executes this in parallel. I don't that think 
> the behavior is desirable for enabling via nodetool.
>  
> Repro (1 node cluster already up):
> {code:bash}
> mkdir /some/path/dir
> touch /some/path/dir/file
> chown -R user: /some/path/dir # Non Cassandra process user
> chmod 700 /some/path/dir
> nodetool enablefullquerylog --path /some/path
> {code}
> Nodetool will give back this error:
> {code:java}
> error: /some/path/dir/file
> -- StackTrace --
> java.nio.file.AccessDeniedException: /some/path/dir/file
>       at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:84)
>       at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>       at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>       at 
> sun.nio.fs.UnixFileSystemProvider.implDelete(UnixFileSystemProvider.java:244)
>       at 
> sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:103)
>       at java.nio.file.Files.delete(Files.java:1126)
>       at 
> org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:250)
>       at 
> org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:237)
>       at 
> org.apache.cassandra.utils.binlog.BinLog.deleteRecursively(BinLog.java:492)
>       at 
> org.apache.cassandra.utils.binlog.BinLog.cleanDirectory(BinLog.java:477)
>       at 
> org.apache.cassandra.utils.binlog.BinLog$Builder.build(BinLog.java:436)
>       at 
> org.apache.cassandra.fql.FullQueryLogger.enable(FullQueryLogger.java:106)
>       at 
> org.apache.cassandra.service.StorageService.enableFullQueryLogger(StorageService.java:5915)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:72)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:276)
>       at 
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)
>       at 
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)
>       at 
> com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
>       at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
>       at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252)
>       at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
>       at 
> com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801)
>       at 
> javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1468)
>       at 
> javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:76)
>       at 
> javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1309)
>       at 
> javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1401)
>       at 
> javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:829)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:357)
>       at sun.rmi.transport.Transport$1.run(Transport.java:200)
>       at sun.rmi.transport.Transport$1.run(Transport.java:197)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at sun.rmi.transport.Transport.serviceCall(Transport.java:196)
>       at 
> sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:573)
>       at 
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:834)
>       at 
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:688)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at 
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:687)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>       at java.lang.Thread.run(Thread.java:748)
> {code}
> On the Cassandra side, we see the following:
> {code:java}
> INFO  [RMI TCP Connection(2)-10.101.33.87] 2021-11-11 00:55:40,716 
> BinLog.java:420 - Attempting to configure bin log: Path: /some/path Roll 
> cycle: HOURLY Blocking: true Max queue weight: 268435456 Max log 
> size:17179869184 Archive command:
> INFO  [RMI TCP Connection(2)-10.101.33.87] 2021-11-11 00:55:40,720 
> BinLog.java:433 - Cleaning directory: /some/path as requested
> ERROR [RMI TCP Connection(2)-10.101.33.87] 2021-11-11 00:55:40,724 
> DefaultFSErrorHandler.java:64 - Stopping transports as disk_failure_policy is 
> stop
> ERROR [RMI TCP Connection(2)-10.101.33.87] 2021-11-11 00:55:40,725 
> StorageService.java:453 - Stopping native transport
> INFO  [RMI TCP Connection(2)-10.101.33.87] 2021-11-11 00:55:40,730 
> Server.java:171 - Stop listening for CQL clients
> ERROR [RMI TCP Connection(2)-10.101.33.87] 2021-11-11 00:55:40,730 
> StorageService.java:458 - Stopping gossiper
> WARN  [RMI TCP Connection(2)-10.101.33.87] 2021-11-11 00:55:40,731 
> StorageService.java:357 - Stopping gossip by operator request
> INFO  [RMI TCP Connection(2)-10.101.33.87] 2021-11-11 00:55:40,731 
> Gossiper.java:1984 - Announcing shutdown
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (CASSANDRA-17136) FQL: Enabling via nodetool can trigger disk_failure_mode

Reply via email to