Thank you very much for everyone's replies, they are all very valuable feedback to me.
I don't really understand what benefit adding restrictions would serve. > Would it be hard coded in C* itself, or configurable? If it's > configurable, then are we just making users enter their commands twice? > This is meant to be used by an operator, so who's actually protected by an > allow-list? > I agree with you too, so I may prefer to idea 2 with implement commitlog archiving in c* (not archiving by user defined shell), and deprecate the commitlog_archiving.properties <https://github.com/apache/cassandra/blob/trunk/conf/commitlog_archiving.properties#L28> configuration through which we can set the properties of commitlog archiving. This view may be similar to that of Scott. If I want to use rclone or aws-cli to archive my commit logs that's my > prerogative. > Yes, it may be very flexible if we set aws-cli in shell. But as I know cassandra-medusa can also do this , and for me letting other tools to do this work may be better , for example we can upload more than one log (if log size is not big ) in a rpc to improve write throughput. I think we can divide this big task into several subtasks: - Add this feature that Stefan mentioned before for commitlog archive CASSANDRA-18550 <https://issues.apache.org/jira/browse/CASSANDRA-18550> in 5.x and may the original commitlog_archiving.properties deprecate. - Add the feature of archiving for cassandra (commitlog/query log/or event sstable) in the long run such as 6.0. I can prepare a cep if necessary. Looking forward to your feedback. We can divide this task into several subtasks and complete them step by step Jordan West <jorda...@gmail.com> 于2024年9月3日周二 00:55写道: > +1 to Scott’s comments. Once you expose those YAML config params outside > of a single node which many of us do, this becomes an RCE attack vector. > Something more structured as Scott proposes, similar to snapshots, would be > preferred. Would recommend a CEP. > > Jordan > > On Fri, Aug 30, 2024 at 20:58 C. Scott Andreas <csco...@icloud.com> wrote: > >> I appreciate this report and would love to work toward the direction it >> recommends. >> >> I’m also familiar with past concerns raised by others with our FQL >> configuration parameters that allow passing shell commands for FQL segment >> archival. >> >> We bias toward ensuring an MBean exists for dynamic modification of yaml >> parameters. When we couple dynamic configuration updates and arbitrary >> shell command execution, we introduce vectors for arbitrary code execution, >> data exfiltration, and data compromise that have a lower bar to achieve >> than local file write. >> >> I agree that we should work toward removing operator-provided shell >> commands in yaml. >> >> For concerns like archival, these seem like areas that Cassandra could >> easily accomplish itself without shelling out to gzip/zstd/lz4-compress a >> file. Introducing a new config structure that declares an archival format, >> accompanying implementations for compression/decompression, and deprecation >> of the prior approach sounds both reasonable and desirable to me. >> >> – Scott >> >> — >> Mobile >> >> On Aug 30, 2024, at 10:25 PM, Bowen Song via dev < >> dev@cassandra.apache.org> wrote: >> >> >> >> I'm not sure what is the concern here. Is it a malicious user exploiting >> this? Or human error with unintended consequences? >> >> For malicious user, in order to exploit this, an attacker needs to be >> able to write to the config file. The config file on Linux by default is >> owned by the root user and has the -rw-r--r-- permission, that means the >> attacker must either gain root access to the system or has the ability to >> write arbitrary file on the filesystem. With either of these permission, >> they can already do almost anything they want (e.g. modify a SUID >> executable file). They wouldn't even need to exploit this to run a script >> or dangerous command. So this sounds like a non-issue to me, at least on >> Linux-based OSes. >> >> For human error, if the operator puts "rm -rf" in it, the software should >> treat it as the operator actually wants to do that. I personally don't like >> software attempting to outsmart human, which often ends up interfering with >> legitimate use cases. The best thing a software can do is log it, so >> there's some traceability if and when things go wrong. >> >> So, IMO, there's nothing wrong with the implementation in Cassandra. >> >> >> On 30/08/2024 17:13, guo Maxwell wrote: >> >> Commitlog has the ability of archive log file, see >> CommitLogArchiver.java >> <https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLogArchiver.java>, >> we can achieve the purpose of archive and restore commitlog by >> configuring archive_command and restore_command in >> commitlog_archiving.properties >> <https://github.com/apache/cassandra/blob/trunk/conf/commitlog_archiving.properties#L28> >> .The archive_command and restore_command can be some linux/unix shell >> command. However, I found that the shell command can actually be filled >> with any script, even if "*rm -rf"* .I have tested this situation and it >> finally succeeded with my test file being deleted. >> >> Personally, I think it is a dangerous behavior, because if there are >> no system-level restrictions and users are allowed to do anything in these >> shell commands. So here I want to discuss with you whether it is >> necessary to impose any restrictions on use, or do we need a new way of >> archiving/restoring commitlog? >> >> Of course, before that, I would also like to ask, how many people are >> using archive and restore of commitlog? It seems that the commitlog archive >> code has not been updated for a long time. >> >> I have two ideas. >> One is to make some restrictions on the command context based on the >> existing usage methods, such as strictly only allowing the current cp/mv/ln >> %path to %name.Other redundant strings in the command are not allowed. >> Another one , As I roughly investigated the archive of mysql and pg. They >> do not give users too much space (I am talking about letting users define >> their own archiving command ), and archive directly to a designated >> location. For us, I feel that we can refer to c * Incremental backup of >> sstable, add a hardlink to the commitlog to the specified location, but >> this place may modify the original configuration method, such as setting >> the archive location and restoring location of the node through nodetool >> and deprecate the commitlog_archiving.properties >> <https://github.com/apache/cassandra/blob/trunk/conf/commitlog_archiving.properties#L28> >> configuration. >> >> I am just putting forward some views here, and looking forward to your >> feedback. 😀 >> >>