Re: 【DISCUSS】The configuration of Commitlog archiving

guo Maxwell Wed, 18 Sep 2024 01:08:28 -0700

Do you have any new updates  on this DISCUSS ?

- The reason this pattern is popular is it allows extension of
functionality ahead of the database. Some people copy to a NAS/SAN. Some
people copy to S3. Some people copy to their own object storage that isn’t
s3 compatible. “Compress and move” is super limiting, because “move” varies
remarkably between environments.


Yes, it is indeed very flexible to use this way, but would it be more
appropriate to decouple the file archiving to heterogeneous storage and
leave it to other systems to handle it specifically? And we only do
compression and copying (file linking like sstable incremental backup)?


Štefan Miklošovič <[email protected]> 于2024年9月5日周四 04:18写道：

>
> On Wed, Sep 4, 2024 at 8:34 PM Jon Haddad <[email protected]> wrote:
>
>> I thought about this a bit over the last few days, and there's actually
>> quite a few problems present that would need to be addressed.
>>
>> *Insecure JMX*
>>
>> First off - if someone has access to JMX, the entire system is already
>> compromised.  A bad actor can mess with the cluster topology, truncate
>> tables, and do a ton of other disruptive stuff.  But if we're going to go
>> down this path I think we should apply your logic consistently to avoid
>> creating a "solution" that has the same "problem" as we do now.  I use
>> quotes because I'm not entirely convinced the root cause of the problem is
>> enabling some shell access, but I'll entertain it for the sake of the
>> discussion.
>>
>> *Dynamic Configuration and Shell Scripts*
>>
>> Let's pretend that somehow an open JMX isn't already a *massive* security
>> flaw by itself.  Once an attacker has control of a system, the next phase
>> of the attack relies on them dynamically changing the configuration to
>> point to a different shell script, or to execute arbitrary shell scripts.
>>
> I agree with the general idea that we don't want this - so in my mind the
>> necessary solution here is to disable the ability to change the commit log
>> archiving behavior at runtime.
>>
>> The idea that commit log archiving (and many other config settings) would
>> be dynamically configurable is a massive security flaw that should be
>> disallowed.  If you want to take this a step further and claim there's a
>> flaw with shell scripts in general, I'll even entertain that for a minute,
>> but we need to examine if the proposed solution of moving code to Java
>> actually solves the problem.
>>
>> *Dynamic Configuration and Java Code*
>>
>> Let's say we've removed the ability to use shell scripts, and we've
>> gotten people to rewrite their shell code with java code, but we've left
>> the dynamic configuration in.  Going back to my original email, I mentioned
>> copying commit logs off the node and into an object store.  If someone is
>> able to change the parameter dynamically at runtime, they could just as
>> easily point to a public S3 bucket and commit logs would be archived there
>> which is just as bad as the shell version.  So if we are to convert this
>> functionality to Java, we should also be making best practice
>> recommendations on what users should and should not do.
>>
>
> I think what you meant here is that if we allowed people to provide a
> pluggable way how stuff is copied over and they would code it up, put that
> JAR on the class path, Cassandra (re)started etc, then someone might
> reconfigure this custom solution in runtime? Yeah, we do not want this. We
> can make it pluggable, but not reconfigurable. To have it pluggable and not
> reconfigurable, then to replace it with something else, an attacker would
> basically need to restart Cassandra with a rogue JAR on the class path. In
> order to do that, I think that at this point it would be beyond any
> salvation and the system is completely compromised anyway.
>
>
>>
>>
>> *Apply All Operational Best Practices*
>>
>> There's been a variety of examples of how a user can further compromise a
>> machine once they have JMX, working in tandem with shell scripts, but I
>> hope at this point you can see that the issue is fundamentally more complex
>> than simply disallowing shell scripts.  The issue is present in the Java
>> examples as well, and is strongly tied to the issue of dynamic config.  If
>> we're to design this the "right" way, I think we'd want these properties:
>>
>> * Commit log archiving should only have the ability to compress and move
>> files to a staging location
>> * Once the files are moved to the staging location, the file should be
>> moved somewhere else by a script NOT run as the C* user.
>>
> * The commit log archive configuration should not be dynamically
>> updatable, nor should any config affecting directories
>>
>
> This would essentially copy the logic we have for snapshots as Jordan
> mentioned. I do not mind having it like that. It is a good question for
> what exactly we need to have it reconfigurable. Why is it like that? People
> do not want to restart a whole cluster consisting of 100 nodes when the
> destination of the archived commit logs changed? How often is this
> happening that we need to expose ourselves to the problems related to that?
>
>
>>
>> Moving the scell configuration to Java code is a half measure that's only
>> solving a tiny problem in a massive chain of events and security holes.
>>
>> Jon
>>
>>
>>
>> On Tue, Sep 3, 2024 at 4:15 AM Štefan Miklošovič <[email protected]>
>> wrote:
>>
>>> Scott is right that this is also coming from us having a MBean method
>>> which allows commands to be changed in runtime. The solution to that was
>>> that we can prevent it from changing dynamically by having a configuration
>>> property, which is actually by default set to false so FQL archiving is
>>> ever possible only in case an operator explicitly enables that.
>>>
>>> However, even if commands were not modifiable in runtime via JMX and
>>> even an operator has a chance to enable command execution explicitly, that
>>> still does not make it 100% secure because an attacker does not need to
>>> change / modify cassandra.yaml where the script to execute is configure,
>>> just the content of such a script which is executed.
>>>
>>> So, introducing a similar property as it was done for FQL would in this
>>> context mean that it would be used for disabling commitlog archiving /
>>> restoring altogether while for FQL it would still do its thing, it would
>>> just not archive it. Whole commitlog archiving / restoring is now based on
>>> some commands to be executed so disabling commands being executed
>>> practically means we disabled this whole feature as such.
>>>
>>> We could indeed make it flat out impossible to execute anything but
>>> these scripts might contain some custom logic, like uploading to various
>>> cloud storages (AWS, Azure, GCP or something completely custom), people
>>> have their own "storage solutions" like remove the old logs when new come
>>> in etc. so by disabling this altogether we would make it impossible and
>>> users would need to accommodate that which would break their existing
>>> solutions.
>>>
>>> What I find confusing is that commitlog_archiving.properties is used
>>> both for restoration AS WELL AS for archiving. If we're ever going to
>>> change how this works, I think that it should be somehow logically split
>>> into archiving and restoring parts.
>>>
>>> So, we might introduce a property in cassandra.yaml to disable
>>> commitlog_archiving.properties altogether and we might deprecate
>>> commitlog_archiving.properties way of doing this (still keep it there for
>>> legacy reasons), add a new cassandra.yaml configuration section for that
>>> and there make the archiving and the restoration pluggable. By default we
>>> would provide "cp $from $to" implemented by Cassandra itself without any
>>> process invocation. Then we might eventually drop
>>> commitlog_archiving.properties but if the maintenance of that is cheap I
>>> would just keep it, we would just flip the switch so a new way of doing
>>> that would be preferable and the old way of doing it (via properties) would
>>> need to be explicitly enabled.
>>>
>>> On Tue, Sep 3, 2024 at 11:55 AM guo Maxwell <[email protected]>
>>> wrote:
>>>
>>>> Thank you very much for everyone's replies, they are all very valuable
>>>> feedback to me.
>>>>
>>>> I don't really understand what benefit adding restrictions would
>>>>> serve.  Would it be hard coded in C* itself, or configurable?  If it's
>>>>> configurable, then are we just making users enter their commands twice?
>>>>> This is meant to be used by an operator, so who's actually protected by an
>>>>> allow-list?
>>>>>
>>>>
>>>> I agree with you too, so I may prefer to idea 2 with implement
>>>> commitlog archiving in c* (not archiving by user defined shell),
>>>> and deprecate the  commitlog_archiving.properties
>>>> <https://github.com/apache/cassandra/blob/trunk/conf/commitlog_archiving.properties#L28>
>>>>  configuration
>>>> through which  we  can set the properties of commitlog archiving. This view
>>>> may be similar to that of Scott.
>>>>
>>>> If I want to use rclone or aws-cli to archive my commit logs that's my
>>>>> prerogative.
>>>>>
>>>>
>>>> Yes, it may be very flexible if we set aws-cli in shell. But as I know
>>>> cassandra-medusa can also do this , and for me letting other tools to do
>>>> this work may be better , for example we can upload more than one log (if
>>>> log size is not big ) in a rpc to improve write throughput.
>>>>
>>>> I think we can divide this big task into several subtasks:
>>>>
>>>>    - Add this feature that Stefan mentioned before for commitlog
>>>>    archive CASSANDRA-18550
>>>>    <https://issues.apache.org/jira/browse/CASSANDRA-18550> in 5.x  and
>>>>    may the original commitlog_archiving.properties  deprecate.
>>>>    - Add the feature of archiving for cassandra (commitlog/query
>>>>    log/or event sstable) in the long run such as 6.0.
>>>>
>>>> I can prepare a cep if necessary. Looking forward to your feedback.
>>>>
>>>>
>>>> We can divide this task into several subtasks and complete them step by
>>>> step
>>>>
>>>>
>>>>
>>>> Jordan West <[email protected]> 于2024年9月3日周二 00:55写道：
>>>>
>>>>> +1 to Scott’s comments. Once you expose those YAML config params
>>>>> outside of a single node which many of us do, this becomes an RCE attack
>>>>> vector. Something more structured as Scott proposes, similar to snapshots,
>>>>> would be preferred. Would recommend a CEP.
>>>>>
>>>>> Jordan
>>>>>
>>>>> On Fri, Aug 30, 2024 at 20:58 C. Scott Andreas <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> I appreciate this report and would love to work toward the direction
>>>>>> it recommends.
>>>>>>
>>>>>> I’m also familiar with past concerns raised by others with our FQL
>>>>>> configuration parameters that allow passing shell commands for FQL 
>>>>>> segment
>>>>>> archival.
>>>>>>
>>>>>> We bias toward ensuring an MBean exists for dynamic modification of
>>>>>> yaml parameters. When we couple dynamic configuration updates and 
>>>>>> arbitrary
>>>>>> shell command execution, we introduce vectors for arbitrary code 
>>>>>> execution,
>>>>>> data exfiltration, and data compromise that have a lower bar to achieve
>>>>>> than local file write.
>>>>>>
>>>>>> I agree that we should work toward removing operator-provided shell
>>>>>> commands in yaml.
>>>>>>
>>>>>> For concerns like archival, these seem like areas that Cassandra
>>>>>> could easily accomplish itself without shelling out to
>>>>>> gzip/zstd/lz4-compress a file. Introducing a new config structure that
>>>>>> declares an archival format, accompanying implementations for
>>>>>> compression/decompression, and deprecation of the prior approach sounds
>>>>>> both reasonable and desirable to me.
>>>>>>
>>>>>> – Scott
>>>>>>
>>>>>> —
>>>>>> Mobile
>>>>>>
>>>>>> On Aug 30, 2024, at 10:25 PM, Bowen Song via dev <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>> 
>>>>>>
>>>>>> I'm not sure what is the concern here. Is it a malicious user
>>>>>> exploiting this? Or human error with unintended consequences?
>>>>>>
>>>>>> For malicious user, in order to exploit this, an attacker needs to be
>>>>>> able to write to the config file. The config file on Linux by default is
>>>>>> owned by the root user and has the -rw-r--r-- permission, that means the
>>>>>> attacker must either gain root access to the system or has the ability to
>>>>>> write arbitrary file on the filesystem. With either of these permission,
>>>>>> they can already do almost anything they want (e.g. modify a SUID
>>>>>> executable file). They wouldn't even need to exploit this to run a script
>>>>>> or dangerous command. So this sounds like a non-issue to me, at least on
>>>>>> Linux-based OSes.
>>>>>>
>>>>>> For human error, if the operator puts "rm -rf" in it, the software
>>>>>> should treat it as the operator actually wants to do that. I personally
>>>>>> don't like software attempting to outsmart human, which often ends up
>>>>>> interfering with legitimate use cases. The best thing a software can do 
>>>>>> is
>>>>>> log it, so there's some traceability if and when things go wrong.
>>>>>>
>>>>>> So, IMO, there's nothing wrong with the implementation in Cassandra.
>>>>>>
>>>>>>
>>>>>> On 30/08/2024 17:13, guo Maxwell wrote:
>>>>>>
>>>>>>     Commitlog has the ability of archive  log file, see
>>>>>> CommitLogArchiver.java
>>>>>> <https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLogArchiver.java>,
>>>>>> we can achieve the purpose of archive and restore commitlog by
>>>>>> configuring archive_command and restore_command in
>>>>>> commitlog_archiving.properties
>>>>>> <https://github.com/apache/cassandra/blob/trunk/conf/commitlog_archiving.properties#L28>
>>>>>> .The archive_command and restore_command can be some linux/unix
>>>>>> shell command.  However, I found that the shell command can actually
>>>>>> be filled with any script, even if "*rm -rf"* .I have tested this
>>>>>> situation and it finally succeeded with my test file being deleted.
>>>>>>
>>>>>>     Personally, I think it is a dangerous behavior, because if there
>>>>>> are no system-level restrictions and users are allowed to do anything in
>>>>>> these shell commands. So here I want to discuss with you whether it
>>>>>> is necessary to impose any restrictions on use, or do we need a new way 
>>>>>> of
>>>>>> archiving/restoring commitlog?
>>>>>>
>>>>>> Of course, before that, I would also like to ask, how many people are
>>>>>> using archive and restore of commitlog? It seems that the commitlog 
>>>>>> archive
>>>>>> code has not been updated for a long time.
>>>>>>
>>>>>> I have two ideas.
>>>>>> One is to make some restrictions on the command context based on the
>>>>>> existing usage methods, such as strictly only allowing the current 
>>>>>> cp/mv/ln
>>>>>> %path to %name.Other redundant strings in the command are not allowed.
>>>>>> Another one , As I roughly investigated the archive of mysql and pg.
>>>>>> They do not give users too much space (I am talking about letting users
>>>>>> define their own archiving command ), and archive directly to a 
>>>>>> designated
>>>>>> location. For us, I feel that we can refer to c * Incremental backup of
>>>>>> sstable,  add a hardlink to the commitlog to the specified location, but
>>>>>> this place may modify the original configuration method, such as setting
>>>>>> the archive location and restoring location of the node through nodetool
>>>>>> and deprecate the  commitlog_archiving.properties
>>>>>> <https://github.com/apache/cassandra/blob/trunk/conf/commitlog_archiving.properties#L28>
>>>>>>  configuration.
>>>>>>
>>>>>> I am just putting forward some views  here, and looking forward to
>>>>>> your feedback. 😀
>>>>>>
>>>>>>

Re: 【DISCUSS】The configuration of Commitlog archiving

Reply via email to