I'm not sure what is the concern here. Is it a malicious user exploiting
this? Or human error with unintended consequences?
For malicious user, in order to exploit this, an attacker needs to be
able to write to the config file. The config file on Linux by default is
owned by the root user and has the -rw-r--r-- permission, that means the
attacker must either gain root access to the system or has the ability
to write arbitrary file on the filesystem. With either of these
permission, they can already do almost anything they want (e.g. modify a
SUID executable file). They wouldn't even need to exploit this to run a
script or dangerous command. So this sounds like a non-issue to me, at
least on Linux-based OSes.
For human error, if the operator puts "rm -rf" in it, the software
should treat it as the operator actually wants to do that. I personally
don't like software attempting to outsmart human, which often ends up
interfering with legitimate use cases. The best thing a software can do
is log it, so there's some traceability if and when things go wrong.
So, IMO, there's nothing wrong with the implementation in Cassandra.
On 30/08/2024 17:13, guo Maxwell wrote:
Commitlog has the ability of archive log file, see
CommitLogArchiver.java
<https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLogArchiver.java>,
we can achieve the purpose of archive and restore commitlog by
configuring archive_command and restore_command in
commitlog_archiving.properties
<https://github.com/apache/cassandra/blob/trunk/conf/commitlog_archiving.properties#L28>.The
archive_command and restore_command can be some linux/unix shell
command. However, I found that the shell command can actually be
filled with any script, even if "*rm -rf"* .I have tested this
situation and it finally succeeded with my test file being deleted.
Personally, I think it is a dangerous behavior, because if there are
no system-level restrictions and users are allowed to do anything in
these shell commands. So here I want to discuss with you whether it is
necessary to impose any restrictions on use, or do we need a new way
of archiving/restoring commitlog?
Of course, before that, I would also like to ask, how many people are
using archive and restore of commitlog? It seems that the commitlog
archive code has not been updated for a long time.
I have two ideas.
One is to make some restrictions on the command context based on the
existing usage methods, such as strictly only allowing the current
cp/mv/ln %path to %name.Other redundant strings in the command are not
allowed.
Another one , As I roughly investigated the archive of mysql and pg.
They do not give users too much space (I am talking about letting
users define their own archiving command ), and archive directly to a
designated location. For us, I feel that we can refer to c *
Incremental backup of sstable, add a hardlink to the commitlog to the
specified location, but this place may modify the original
configuration method, such as setting the archive location and
restoring location of the node through nodetool and deprecate the
commitlog_archiving.properties
<https://github.com/apache/cassandra/blob/trunk/conf/commitlog_archiving.properties#L28>
configuration.
I am just putting forward some views here, and looking forward to
your feedback. 😀