Re: 【DISCUSS】The configuration of Commitlog archiving

Štefan Miklošovič Tue, 03 Sep 2024 04:15:02 -0700

Scott is right that this is also coming from us having a MBean method which
allows commands to be changed in runtime. The solution to that was that we
can prevent it from changing dynamically by having a configuration
property, which is actually by default set to false so FQL archiving is
ever possible only in case an operator explicitly enables that.


However, even if commands were not modifiable in runtime via JMX and even
an operator has a chance to enable command execution explicitly, that still
does not make it 100% secure because an attacker does not need to change /
modify cassandra.yaml where the script to execute is configure, just the
content of such a script which is executed.

So, introducing a similar property as it was done for FQL would in this
context mean that it would be used for disabling commitlog archiving /
restoring altogether while for FQL it would still do its thing, it would
just not archive it. Whole commitlog archiving / restoring is now based on
some commands to be executed so disabling commands being executed
practically means we disabled this whole feature as such.

We could indeed make it flat out impossible to execute anything but these
scripts might contain some custom logic, like uploading to various cloud
storages (AWS, Azure, GCP or something completely custom), people have
their own "storage solutions" like remove the old logs when new come in
etc. so by disabling this altogether we would make it impossible and users
would need to accommodate that which would break their existing solutions.

What I find confusing is that commitlog_archiving.properties is used both
for restoration AS WELL AS for archiving. If we're ever going to change how
this works, I think that it should be somehow logically split into
archiving and restoring parts.

So, we might introduce a property in cassandra.yaml to disable
commitlog_archiving.properties altogether and we might deprecate
commitlog_archiving.properties way of doing this (still keep it there for
legacy reasons), add a new cassandra.yaml configuration section for that
and there make the archiving and the restoration pluggable. By default we
would provide "cp $from $to" implemented by Cassandra itself without any
process invocation. Then we might eventually drop
commitlog_archiving.properties but if the maintenance of that is cheap I
would just keep it, we would just flip the switch so a new way of doing
that would be preferable and the old way of doing it (via properties) would
need to be explicitly enabled.

On Tue, Sep 3, 2024 at 11:55 AM guo Maxwell <[email protected]> wrote:

> Thank you very much for everyone's replies, they are all very valuable
> feedback to me.
>
> I don't really understand what benefit adding restrictions would serve.
>> Would it be hard coded in C* itself, or configurable?  If it's
>> configurable, then are we just making users enter their commands twice?
>> This is meant to be used by an operator, so who's actually protected by an
>> allow-list?
>>
>
> I agree with you too, so I may prefer to idea 2 with implement commitlog
> archiving in c* (not archiving by user defined shell), and deprecate the
> commitlog_archiving.properties
> <https://github.com/apache/cassandra/blob/trunk/conf/commitlog_archiving.properties#L28>
>  configuration
> through which  we  can set the properties of commitlog archiving. This view
> may be similar to that of Scott.
>
> If I want to use rclone or aws-cli to archive my commit logs that's my
>> prerogative.
>>
>
> Yes, it may be very flexible if we set aws-cli in shell. But as I know
> cassandra-medusa can also do this , and for me letting other tools to do
> this work may be better , for example we can upload more than one log (if
> log size is not big ) in a rpc to improve write throughput.
>
> I think we can divide this big task into several subtasks:
>
>    - Add this feature that Stefan mentioned before for commitlog archive
>    CASSANDRA-18550 <https://issues.apache.org/jira/browse/CASSANDRA-18550> in
>    5.x  and may the original commitlog_archiving.properties  deprecate.
>    - Add the feature of archiving for cassandra (commitlog/query log/or
>    event sstable) in the long run such as 6.0.
>
> I can prepare a cep if necessary. Looking forward to your feedback.
>
>
> We can divide this task into several subtasks and complete them step by
> step
>
>
>
> Jordan West <[email protected]> 于2024年9月3日周二 00:55写道：
>
>> +1 to Scott’s comments. Once you expose those YAML config params outside
>> of a single node which many of us do, this becomes an RCE attack vector.
>> Something more structured as Scott proposes, similar to snapshots, would be
>> preferred. Would recommend a CEP.
>>
>> Jordan
>>
>> On Fri, Aug 30, 2024 at 20:58 C. Scott Andreas <[email protected]>
>> wrote:
>>
>>> I appreciate this report and would love to work toward the direction it
>>> recommends.
>>>
>>> I’m also familiar with past concerns raised by others with our FQL
>>> configuration parameters that allow passing shell commands for FQL segment
>>> archival.
>>>
>>> We bias toward ensuring an MBean exists for dynamic modification of yaml
>>> parameters. When we couple dynamic configuration updates and arbitrary
>>> shell command execution, we introduce vectors for arbitrary code execution,
>>> data exfiltration, and data compromise that have a lower bar to achieve
>>> than local file write.
>>>
>>> I agree that we should work toward removing operator-provided shell
>>> commands in yaml.
>>>
>>> For concerns like archival, these seem like areas that Cassandra could
>>> easily accomplish itself without shelling out to gzip/zstd/lz4-compress a
>>> file. Introducing a new config structure that declares an archival format,
>>> accompanying implementations for compression/decompression, and deprecation
>>> of the prior approach sounds both reasonable and desirable to me.
>>>
>>> – Scott
>>>
>>> —
>>> Mobile
>>>
>>> On Aug 30, 2024, at 10:25 PM, Bowen Song via dev <
>>> [email protected]> wrote:
>>>
>>> 
>>>
>>> I'm not sure what is the concern here. Is it a malicious user exploiting
>>> this? Or human error with unintended consequences?
>>>
>>> For malicious user, in order to exploit this, an attacker needs to be
>>> able to write to the config file. The config file on Linux by default is
>>> owned by the root user and has the -rw-r--r-- permission, that means the
>>> attacker must either gain root access to the system or has the ability to
>>> write arbitrary file on the filesystem. With either of these permission,
>>> they can already do almost anything they want (e.g. modify a SUID
>>> executable file). They wouldn't even need to exploit this to run a script
>>> or dangerous command. So this sounds like a non-issue to me, at least on
>>> Linux-based OSes.
>>>
>>> For human error, if the operator puts "rm -rf" in it, the software
>>> should treat it as the operator actually wants to do that. I personally
>>> don't like software attempting to outsmart human, which often ends up
>>> interfering with legitimate use cases. The best thing a software can do is
>>> log it, so there's some traceability if and when things go wrong.
>>>
>>> So, IMO, there's nothing wrong with the implementation in Cassandra.
>>>
>>>
>>> On 30/08/2024 17:13, guo Maxwell wrote:
>>>
>>>     Commitlog has the ability of archive  log file, see
>>> CommitLogArchiver.java
>>> <https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLogArchiver.java>,
>>> we can achieve the purpose of archive and restore commitlog by
>>> configuring archive_command and restore_command in
>>> commitlog_archiving.properties
>>> <https://github.com/apache/cassandra/blob/trunk/conf/commitlog_archiving.properties#L28>
>>> .The archive_command and restore_command can be some linux/unix shell
>>> command.  However, I found that the shell command can actually be
>>> filled with any script, even if "*rm -rf"* .I have tested this
>>> situation and it finally succeeded with my test file being deleted.
>>>
>>>     Personally, I think it is a dangerous behavior, because if there are
>>> no system-level restrictions and users are allowed to do anything in these
>>> shell commands. So here I want to discuss with you whether it is
>>> necessary to impose any restrictions on use, or do we need a new way of
>>> archiving/restoring commitlog?
>>>
>>> Of course, before that, I would also like to ask, how many people are
>>> using archive and restore of commitlog? It seems that the commitlog archive
>>> code has not been updated for a long time.
>>>
>>> I have two ideas.
>>> One is to make some restrictions on the command context based on the
>>> existing usage methods, such as strictly only allowing the current cp/mv/ln
>>> %path to %name.Other redundant strings in the command are not allowed.
>>> Another one , As I roughly investigated the archive of mysql and pg.
>>> They do not give users too much space (I am talking about letting users
>>> define their own archiving command ), and archive directly to a designated
>>> location. For us, I feel that we can refer to c * Incremental backup of
>>> sstable,  add a hardlink to the commitlog to the specified location, but
>>> this place may modify the original configuration method, such as setting
>>> the archive location and restoring location of the node through nodetool
>>> and deprecate the  commitlog_archiving.properties
>>> <https://github.com/apache/cassandra/blob/trunk/conf/commitlog_archiving.properties#L28>
>>>  configuration.
>>>
>>> I am just putting forward some views  here, and looking forward to your
>>> feedback. 😀
>>>
>>>

Re: 【DISCUSS】The configuration of Commitlog archiving

Reply via email to