Re: 【DISCUSS】The configuration of Commitlog archiving

Jon Haddad Tue, 24 Sep 2024 10:16:26 -0700

I categorize this as Not a Problem.

So far the two main justifications have been operator error and a fairly
convoluted series of steps to an already compromised database.  I don't
view either of them as a reason to inconvenience users.


If someone wants to avoid the shell command, what's wrong with CDC?

Jon




On Tue, Sep 24, 2024 at 9:21 AM guo Maxwell <[email protected]> wrote:

> Hello，are there any new updates？🤔
>
> guo Maxwell <[email protected]>于2024年9月18日 周三下午4:06写道：
>
>> Do you have any new updates  on this DISCUSS ?
>>
>> - The reason this pattern is popular is it allows extension of
>> functionality ahead of the database. Some people copy to a NAS/SAN. Some
>> people copy to S3. Some people copy to their own object storage that isn’t
>> s3 compatible. “Compress and move” is super limiting, because “move” varies
>> remarkably between environments.
>>
>> Yes, it is indeed very flexible to use this way, but would it be more
>> appropriate to decouple the file archiving to heterogeneous storage and
>> leave it to other systems to handle it specifically? And we only do
>> compression and copying (file linking like sstable incremental backup)?
>>
>>
>> Štefan Miklošovič <[email protected]> 于2024年9月5日周四 04:18写道：
>>
>>>
>>> On Wed, Sep 4, 2024 at 8:34 PM Jon Haddad <[email protected]> wrote:
>>>
>>>> I thought about this a bit over the last few days, and there's actually
>>>> quite a few problems present that would need to be addressed.
>>>>
>>>> *Insecure JMX*
>>>>
>>>> First off - if someone has access to JMX, the entire system is already
>>>> compromised.  A bad actor can mess with the cluster topology, truncate
>>>> tables, and do a ton of other disruptive stuff.  But if we're going to go
>>>> down this path I think we should apply your logic consistently to avoid
>>>> creating a "solution" that has the same "problem" as we do now.  I use
>>>> quotes because I'm not entirely convinced the root cause of the problem is
>>>> enabling some shell access, but I'll entertain it for the sake of the
>>>> discussion.
>>>>
>>>> *Dynamic Configuration and Shell Scripts*
>>>>
>>>> Let's pretend that somehow an open JMX isn't already a *massive*
>>>> security flaw by itself.  Once an attacker has control of a system, the
>>>> next phase of the attack relies on them dynamically changing the
>>>> configuration to point to a different shell script, or to execute arbitrary
>>>> shell scripts.
>>>>
>>> I agree with the general idea that we don't want this - so in my mind
>>>> the necessary solution here is to disable the ability to change the commit
>>>> log archiving behavior at runtime.
>>>>
>>>> The idea that commit log archiving (and many other config settings)
>>>> would be dynamically configurable is a massive security flaw that should be
>>>> disallowed.  If you want to take this a step further and claim there's a
>>>> flaw with shell scripts in general, I'll even entertain that for a minute,
>>>> but we need to examine if the proposed solution of moving code to Java
>>>> actually solves the problem.
>>>>
>>>> *Dynamic Configuration and Java Code*
>>>>
>>>> Let's say we've removed the ability to use shell scripts, and we've
>>>> gotten people to rewrite their shell code with java code, but we've left
>>>> the dynamic configuration in.  Going back to my original email, I mentioned
>>>> copying commit logs off the node and into an object store.  If someone is
>>>> able to change the parameter dynamically at runtime, they could just as
>>>> easily point to a public S3 bucket and commit logs would be archived there
>>>> which is just as bad as the shell version.  So if we are to convert this
>>>> functionality to Java, we should also be making best practice
>>>> recommendations on what users should and should not do.
>>>>
>>>
>>> I think what you meant here is that if we allowed people to provide a
>>> pluggable way how stuff is copied over and they would code it up, put that
>>> JAR on the class path, Cassandra (re)started etc, then someone might
>>> reconfigure this custom solution in runtime? Yeah, we do not want this. We
>>> can make it pluggable, but not reconfigurable. To have it pluggable and not
>>> reconfigurable, then to replace it with something else, an attacker would
>>> basically need to restart Cassandra with a rogue JAR on the class path. In
>>> order to do that, I think that at this point it would be beyond any
>>> salvation and the system is completely compromised anyway.
>>>
>>>
>>>>
>>>>
>>>> *Apply All Operational Best Practices*
>>>>
>>>> There's been a variety of examples of how a user can further compromise
>>>> a machine once they have JMX, working in tandem with shell scripts, but I
>>>> hope at this point you can see that the issue is fundamentally more complex
>>>> than simply disallowing shell scripts.  The issue is present in the Java
>>>> examples as well, and is strongly tied to the issue of dynamic config.  If
>>>> we're to design this the "right" way, I think we'd want these properties:
>>>>
>>>> * Commit log archiving should only have the ability to compress and
>>>> move files to a staging location
>>>> * Once the files are moved to the staging location, the file should be
>>>> moved somewhere else by a script NOT run as the C* user.
>>>>
>>> * The commit log archive configuration should not be dynamically
>>>> updatable, nor should any config affecting directories
>>>>
>>>
>>> This would essentially copy the logic we have for snapshots as Jordan
>>> mentioned. I do not mind having it like that. It is a good question for
>>> what exactly we need to have it reconfigurable. Why is it like that? People
>>> do not want to restart a whole cluster consisting of 100 nodes when the
>>> destination of the archived commit logs changed? How often is this
>>> happening that we need to expose ourselves to the problems related to that?
>>>
>>>
>>>>
>>>> Moving the scell configuration to Java code is a half measure that's
>>>> only solving a tiny problem in a massive chain of events and security
>>>> holes.
>>>>
>>>> Jon
>>>>
>>>>
>>>>
>>>> On Tue, Sep 3, 2024 at 4:15 AM Štefan Miklošovič <
>>>> [email protected]> wrote:
>>>>
>>>>> Scott is right that this is also coming from us having a MBean method
>>>>> which allows commands to be changed in runtime. The solution to that was
>>>>> that we can prevent it from changing dynamically by having a configuration
>>>>> property, which is actually by default set to false so FQL archiving is
>>>>> ever possible only in case an operator explicitly enables that.
>>>>>
>>>>> However, even if commands were not modifiable in runtime via JMX and
>>>>> even an operator has a chance to enable command execution explicitly, that
>>>>> still does not make it 100% secure because an attacker does not need to
>>>>> change / modify cassandra.yaml where the script to execute is configure,
>>>>> just the content of such a script which is executed.
>>>>>
>>>>> So, introducing a similar property as it was done for FQL would in
>>>>> this context mean that it would be used for disabling commitlog archiving 
>>>>> /
>>>>> restoring altogether while for FQL it would still do its thing, it would
>>>>> just not archive it. Whole commitlog archiving / restoring is now based on
>>>>> some commands to be executed so disabling commands being executed
>>>>> practically means we disabled this whole feature as such.
>>>>>
>>>>> We could indeed make it flat out impossible to execute anything but
>>>>> these scripts might contain some custom logic, like uploading to various
>>>>> cloud storages (AWS, Azure, GCP or something completely custom), people
>>>>> have their own "storage solutions" like remove the old logs when new come
>>>>> in etc. so by disabling this altogether we would make it impossible and
>>>>> users would need to accommodate that which would break their existing
>>>>> solutions.
>>>>>
>>>>> What I find confusing is that commitlog_archiving.properties is used
>>>>> both for restoration AS WELL AS for archiving. If we're ever going to
>>>>> change how this works, I think that it should be somehow logically split
>>>>> into archiving and restoring parts.
>>>>>
>>>>> So, we might introduce a property in cassandra.yaml to disable
>>>>> commitlog_archiving.properties altogether and we might deprecate
>>>>> commitlog_archiving.properties way of doing this (still keep it there for
>>>>> legacy reasons), add a new cassandra.yaml configuration section for that
>>>>> and there make the archiving and the restoration pluggable. By default we
>>>>> would provide "cp $from $to" implemented by Cassandra itself without any
>>>>> process invocation. Then we might eventually drop
>>>>> commitlog_archiving.properties but if the maintenance of that is cheap I
>>>>> would just keep it, we would just flip the switch so a new way of doing
>>>>> that would be preferable and the old way of doing it (via properties) 
>>>>> would
>>>>> need to be explicitly enabled.
>>>>>
>>>>> On Tue, Sep 3, 2024 at 11:55 AM guo Maxwell <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Thank you very much for everyone's replies, they are all very
>>>>>> valuable feedback to me.
>>>>>>
>>>>>> I don't really understand what benefit adding restrictions would
>>>>>>> serve.  Would it be hard coded in C* itself, or configurable?  If it's
>>>>>>> configurable, then are we just making users enter their commands twice?
>>>>>>> This is meant to be used by an operator, so who's actually protected by 
>>>>>>> an
>>>>>>> allow-list?
>>>>>>>
>>>>>>
>>>>>> I agree with you too, so I may prefer to idea 2 with implement
>>>>>> commitlog archiving in c* (not archiving by user defined shell),
>>>>>> and deprecate the  commitlog_archiving.properties
>>>>>> <https://github.com/apache/cassandra/blob/trunk/conf/commitlog_archiving.properties#L28>
>>>>>>  configuration
>>>>>> through which  we  can set the properties of commitlog archiving. This 
>>>>>> view
>>>>>> may be similar to that of Scott.
>>>>>>
>>>>>> If I want to use rclone or aws-cli to archive my commit logs that's
>>>>>>> my prerogative.
>>>>>>>
>>>>>>
>>>>>> Yes, it may be very flexible if we set aws-cli in shell. But as I
>>>>>> know cassandra-medusa can also do this , and for me letting other tools 
>>>>>> to
>>>>>> do this work may be better , for example we can upload more than one log
>>>>>> (if log size is not big ) in a rpc to improve write throughput.
>>>>>>
>>>>>> I think we can divide this big task into several subtasks:
>>>>>>
>>>>>>    - Add this feature that Stefan mentioned before for commitlog
>>>>>>    archive CASSANDRA-18550
>>>>>>    <https://issues.apache.org/jira/browse/CASSANDRA-18550> in 5.x
>>>>>>    and may the original commitlog_archiving.properties  deprecate.
>>>>>>    - Add the feature of archiving for cassandra (commitlog/query
>>>>>>    log/or event sstable) in the long run such as 6.0.
>>>>>>
>>>>>> I can prepare a cep if necessary. Looking forward to your feedback.
>>>>>>
>>>>>>
>>>>>> We can divide this task into several subtasks and complete them step
>>>>>> by step
>>>>>>
>>>>>>
>>>>>>
>>>>>> Jordan West <[email protected]> 于2024年9月3日周二 00:55写道：
>>>>>>
>>>>>>> +1 to Scott’s comments. Once you expose those YAML config params
>>>>>>> outside of a single node which many of us do, this becomes an RCE attack
>>>>>>> vector. Something more structured as Scott proposes, similar to 
>>>>>>> snapshots,
>>>>>>> would be preferred. Would recommend a CEP.
>>>>>>>
>>>>>>> Jordan
>>>>>>>
>>>>>>> On Fri, Aug 30, 2024 at 20:58 C. Scott Andreas <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I appreciate this report and would love to work toward the
>>>>>>>> direction it recommends.
>>>>>>>>
>>>>>>>> I’m also familiar with past concerns raised by others with our FQL
>>>>>>>> configuration parameters that allow passing shell commands for FQL 
>>>>>>>> segment
>>>>>>>> archival.
>>>>>>>>
>>>>>>>> We bias toward ensuring an MBean exists for dynamic modification of
>>>>>>>> yaml parameters. When we couple dynamic configuration updates and 
>>>>>>>> arbitrary
>>>>>>>> shell command execution, we introduce vectors for arbitrary code 
>>>>>>>> execution,
>>>>>>>> data exfiltration, and data compromise that have a lower bar to achieve
>>>>>>>> than local file write.
>>>>>>>>
>>>>>>>> I agree that we should work toward removing operator-provided shell
>>>>>>>> commands in yaml.
>>>>>>>>
>>>>>>>> For concerns like archival, these seem like areas that Cassandra
>>>>>>>> could easily accomplish itself without shelling out to
>>>>>>>> gzip/zstd/lz4-compress a file. Introducing a new config structure that
>>>>>>>> declares an archival format, accompanying implementations for
>>>>>>>> compression/decompression, and deprecation of the prior approach sounds
>>>>>>>> both reasonable and desirable to me.
>>>>>>>>
>>>>>>>> – Scott
>>>>>>>>
>>>>>>>> —
>>>>>>>> Mobile
>>>>>>>>
>>>>>>>> On Aug 30, 2024, at 10:25 PM, Bowen Song via dev <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>> 
>>>>>>>>
>>>>>>>> I'm not sure what is the concern here. Is it a malicious user
>>>>>>>> exploiting this? Or human error with unintended consequences?
>>>>>>>>
>>>>>>>> For malicious user, in order to exploit this, an attacker needs to
>>>>>>>> be able to write to the config file. The config file on Linux by 
>>>>>>>> default is
>>>>>>>> owned by the root user and has the -rw-r--r-- permission, that means 
>>>>>>>> the
>>>>>>>> attacker must either gain root access to the system or has the ability 
>>>>>>>> to
>>>>>>>> write arbitrary file on the filesystem. With either of these 
>>>>>>>> permission,
>>>>>>>> they can already do almost anything they want (e.g. modify a SUID
>>>>>>>> executable file). They wouldn't even need to exploit this to run a 
>>>>>>>> script
>>>>>>>> or dangerous command. So this sounds like a non-issue to me, at least 
>>>>>>>> on
>>>>>>>> Linux-based OSes.
>>>>>>>>
>>>>>>>> For human error, if the operator puts "rm -rf" in it, the software
>>>>>>>> should treat it as the operator actually wants to do that. I personally
>>>>>>>> don't like software attempting to outsmart human, which often ends up
>>>>>>>> interfering with legitimate use cases. The best thing a software can 
>>>>>>>> do is
>>>>>>>> log it, so there's some traceability if and when things go wrong.
>>>>>>>>
>>>>>>>> So, IMO, there's nothing wrong with the implementation in Cassandra.
>>>>>>>>
>>>>>>>>
>>>>>>>> On 30/08/2024 17:13, guo Maxwell wrote:
>>>>>>>>
>>>>>>>>     Commitlog has the ability of archive  log file, see
>>>>>>>> CommitLogArchiver.java
>>>>>>>> <https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLogArchiver.java>,
>>>>>>>> we can achieve the purpose of archive and restore commitlog by
>>>>>>>> configuring archive_command and restore_command in
>>>>>>>> commitlog_archiving.properties
>>>>>>>> <https://github.com/apache/cassandra/blob/trunk/conf/commitlog_archiving.properties#L28>
>>>>>>>> .The archive_command and restore_command can be some linux/unix
>>>>>>>> shell command.  However, I found that the shell command can
>>>>>>>> actually be filled with any script, even if "*rm -rf"* .I have
>>>>>>>> tested this situation and it finally succeeded with my test file being
>>>>>>>> deleted.
>>>>>>>>
>>>>>>>>     Personally, I think it is a dangerous behavior, because if
>>>>>>>> there are no system-level restrictions and users are allowed to do 
>>>>>>>> anything
>>>>>>>> in these shell commands. So here I want to discuss with you
>>>>>>>> whether it is necessary to impose any restrictions on use, or do we 
>>>>>>>> need a
>>>>>>>> new way of archiving/restoring commitlog?
>>>>>>>>
>>>>>>>> Of course, before that, I would also like to ask, how many people
>>>>>>>> are using archive and restore of commitlog? It seems that the commitlog
>>>>>>>> archive code has not been updated for a long time.
>>>>>>>>
>>>>>>>> I have two ideas.
>>>>>>>> One is to make some restrictions on the command context based on
>>>>>>>> the existing usage methods, such as strictly only allowing the current
>>>>>>>> cp/mv/ln %path to %name.Other redundant strings in the command are not
>>>>>>>> allowed.
>>>>>>>> Another one , As I roughly investigated the archive of mysql and
>>>>>>>> pg. They do not give users too much space (I am talking about letting 
>>>>>>>> users
>>>>>>>> define their own archiving command ), and archive directly to a 
>>>>>>>> designated
>>>>>>>> location. For us, I feel that we can refer to c * Incremental backup of
>>>>>>>> sstable,  add a hardlink to the commitlog to the specified location, 
>>>>>>>> but
>>>>>>>> this place may modify the original configuration method, such as 
>>>>>>>> setting
>>>>>>>> the archive location and restoring location of the node through 
>>>>>>>> nodetool
>>>>>>>> and deprecate the  commitlog_archiving.properties
>>>>>>>> <https://github.com/apache/cassandra/blob/trunk/conf/commitlog_archiving.properties#L28>
>>>>>>>>  configuration.
>>>>>>>>
>>>>>>>> I am just putting forward some views  here, and looking forward to
>>>>>>>> your feedback. 😀
>>>>>>>>
>>>>>>>>

Re: 【DISCUSS】The configuration of Commitlog archiving

Reply via email to