Re: 【DISCUSS】The configuration of Commitlog archiving

guo Maxwell Tue, 03 Sep 2024 02:54:10 -0700

Thank you very much for everyone's replies, they are all very valuable
feedback to me.


I don't really understand what benefit adding restrictions would serve.
> Would it be hard coded in C* itself, or configurable?  If it's
> configurable, then are we just making users enter their commands twice?
> This is meant to be used by an operator, so who's actually protected by an
> allow-list?
>

I agree with you too, so I may prefer to idea 2 with implement commitlog
archiving in c* (not archiving by user defined shell), and deprecate the
commitlog_archiving.properties
<https://github.com/apache/cassandra/blob/trunk/conf/commitlog_archiving.properties#L28>
configuration
through which  we  can set the properties of commitlog archiving. This view
may be similar to that of Scott.

If I want to use rclone or aws-cli to archive my commit logs that's my
> prerogative.
>

Yes, it may be very flexible if we set aws-cli in shell. But as I know
cassandra-medusa can also do this , and for me letting other tools to do
this work may be better , for example we can upload more than one log (if
log size is not big ) in a rpc to improve write throughput.

I think we can divide this big task into several subtasks:

   - Add this feature that Stefan mentioned before for commitlog archive
   CASSANDRA-18550 <https://issues.apache.org/jira/browse/CASSANDRA-18550> in
   5.x  and may the original commitlog_archiving.properties  deprecate.
   - Add the feature of archiving for cassandra (commitlog/query log/or
   event sstable) in the long run such as 6.0.

I can prepare a cep if necessary. Looking forward to your feedback.


We can divide this task into several subtasks and complete them step by step



Jordan West <jorda...@gmail.com> 于2024年9月3日周二 00:55写道：

> +1 to Scott’s comments. Once you expose those YAML config params outside
> of a single node which many of us do, this becomes an RCE attack vector.
> Something more structured as Scott proposes, similar to snapshots, would be
> preferred. Would recommend a CEP.
>
> Jordan
>
> On Fri, Aug 30, 2024 at 20:58 C. Scott Andreas <csco...@icloud.com> wrote:
>
>> I appreciate this report and would love to work toward the direction it
>> recommends.
>>
>> I’m also familiar with past concerns raised by others with our FQL
>> configuration parameters that allow passing shell commands for FQL segment
>> archival.
>>
>> We bias toward ensuring an MBean exists for dynamic modification of yaml
>> parameters. When we couple dynamic configuration updates and arbitrary
>> shell command execution, we introduce vectors for arbitrary code execution,
>> data exfiltration, and data compromise that have a lower bar to achieve
>> than local file write.
>>
>> I agree that we should work toward removing operator-provided shell
>> commands in yaml.
>>
>> For concerns like archival, these seem like areas that Cassandra could
>> easily accomplish itself without shelling out to gzip/zstd/lz4-compress a
>> file. Introducing a new config structure that declares an archival format,
>> accompanying implementations for compression/decompression, and deprecation
>> of the prior approach sounds both reasonable and desirable to me.
>>
>> – Scott
>>
>> —
>> Mobile
>>
>> On Aug 30, 2024, at 10:25 PM, Bowen Song via dev <
>> dev@cassandra.apache.org> wrote:
>>
>> 
>>
>> I'm not sure what is the concern here. Is it a malicious user exploiting
>> this? Or human error with unintended consequences?
>>
>> For malicious user, in order to exploit this, an attacker needs to be
>> able to write to the config file. The config file on Linux by default is
>> owned by the root user and has the -rw-r--r-- permission, that means the
>> attacker must either gain root access to the system or has the ability to
>> write arbitrary file on the filesystem. With either of these permission,
>> they can already do almost anything they want (e.g. modify a SUID
>> executable file). They wouldn't even need to exploit this to run a script
>> or dangerous command. So this sounds like a non-issue to me, at least on
>> Linux-based OSes.
>>
>> For human error, if the operator puts "rm -rf" in it, the software should
>> treat it as the operator actually wants to do that. I personally don't like
>> software attempting to outsmart human, which often ends up interfering with
>> legitimate use cases. The best thing a software can do is log it, so
>> there's some traceability if and when things go wrong.
>>
>> So, IMO, there's nothing wrong with the implementation in Cassandra.
>>
>>
>> On 30/08/2024 17:13, guo Maxwell wrote:
>>
>>     Commitlog has the ability of archive  log file, see
>> CommitLogArchiver.java
>> <https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLogArchiver.java>,
>> we can achieve the purpose of archive and restore commitlog by
>> configuring archive_command and restore_command in
>> commitlog_archiving.properties
>> <https://github.com/apache/cassandra/blob/trunk/conf/commitlog_archiving.properties#L28>
>> .The archive_command and restore_command can be some linux/unix shell
>> command.  However, I found that the shell command can actually be filled
>> with any script, even if "*rm -rf"* .I have tested this situation and it
>> finally succeeded with my test file being deleted.
>>
>>     Personally, I think it is a dangerous behavior, because if there are
>> no system-level restrictions and users are allowed to do anything in these
>> shell commands. So here I want to discuss with you whether it is
>> necessary to impose any restrictions on use, or do we need a new way of
>> archiving/restoring commitlog?
>>
>> Of course, before that, I would also like to ask, how many people are
>> using archive and restore of commitlog? It seems that the commitlog archive
>> code has not been updated for a long time.
>>
>> I have two ideas.
>> One is to make some restrictions on the command context based on the
>> existing usage methods, such as strictly only allowing the current cp/mv/ln
>> %path to %name.Other redundant strings in the command are not allowed.
>> Another one , As I roughly investigated the archive of mysql and pg. They
>> do not give users too much space (I am talking about letting users define
>> their own archiving command ), and archive directly to a designated
>> location. For us, I feel that we can refer to c * Incremental backup of
>> sstable,  add a hardlink to the commitlog to the specified location, but
>> this place may modify the original configuration method, such as setting
>> the archive location and restoring location of the node through nodetool
>> and deprecate the  commitlog_archiving.properties
>> <https://github.com/apache/cassandra/blob/trunk/conf/commitlog_archiving.properties#L28>
>>  configuration.
>>
>> I am just putting forward some views  here, and looking forward to your
>> feedback. 😀
>>
>>

Re: 【DISCUSS】The configuration of Commitlog archiving

Reply via email to