@Nicolas i hope mediawiki doesn't run on Windows, because that escapeshellarg-replacement you did is not valid for windows.
the code <?php function e(string $str):string{ return "'" . str_replace( "'", "'\\''", $str ) . "'"; } $arg = "foo && whoami && echo "; $cmd = "echo ".e($arg); echo $cmd; ?> prints: echo 'foo && whoami && echo ' and when i run that in bash i get: C:\Users\hansh>echo 'foo && whoami && echo ' 'foo laptop-1plmku02\hansh ' - whoami was executed even though it ran through mediawiki's escapeshellarg replacement PS creating a proper escapeshellarg for windows is actually difficult, see https://docs.microsoft.com/en-gb/archive/blogs/twistylittlepassagesallalike/everyone-quotes-command-line-arguments-the-wrong-way (and even php's upstream escapeshellarg() doesn't get it quite right on Windows, corrupting more data than it strictly needs to corrupt, but i don't recall the specifics) On Mon, 11 Oct 2021 at 09:38, Nicolas Grekas <nicolas.grekas+...@gmail.com> wrote: > Le lun. 11 oct. 2021 à 03:33, Tim Starling <tstarl...@wikimedia.org> a > écrit : > > > On 4/10/21 9:08 pm, Nikita Popov wrote: > > > > > > Hi Tim, > > > > > > Thanks for creating this proposal, it looks great! > > > > > > I think this is a very beneficial change, and the amount of > > > incorrect locale-dependent calls we had just in php-src further > > > convinced me of this: We're generally aware of the problem, and we > > > still made this mistake. Many times. > > > > > > The only open question I have is regarding the ctype_* functions. > > > One might argue that these functions should be locale-independent as > > > well. Certainly, whenever I have used ctype_digit() I only intended > > > it to match [0-9]. It seems like some people try to use > > > ctype_alpha() in a locale-sensitive way > > > ( > > > https://stackoverflow.com/questions/19929965/php-setlocale-not-working-for-ctype-alpha-check > > > < > > > https://stackoverflow.com/questions/19929965/php-setlocale-not-working-for-ctype-alpha-check > > >) > > > and then fail because it doesn't support UTF-8. > > > > > OK, I removed ctype_tolower() and ctype_toupper() from the RFC and the > > PR since they would be incompatible with a move towards a > > locale-independent ctype extension. > > > > The non-controversial parts of the PR were split and merged, so I > > rebased the PR and updated the RFC accordingly. > > > > Do you think the RFC is ready for voting now? > > > > > > > PS: Regarding escapeshellarg(), are you aware of the array command > > > support for proc_open() that was added in PHP 7.4? That does away > > > the need to escape arguments. > > > > It doesn't really help us. I recently wrote a new shell command > > execution system for MediaWiki called Shellbox. As part of that > > project, I reviewed how shell execution is used in the MediaWiki > > ecosystem. There are a lot of callers which are using shell features, > > for example redirecting inputs or outputs, or constructing pipelines. > > I didn't really want to break them all or reimplement those features > > without the shell. And we have security and containerization wrappers > > which depend on construction of a shell command string. So we need to > > be able to construct shell command strings safely. > > > > After studying locale sensitivity for this RFC, I decided to get rid > > of escapeshellarg() from MediaWiki. Instead we are doing our own shell > > escaping: > > > > https://gerrit.wikimedia.org/r/c/mediawiki/libs/Shellbox/+/722548 > > > > I also made MediaWiki use a fixed locale, instead of being configurable. > > > > Hi Tim, > > thanks for the RFC and for the above pointers, I'm going to have a look at > Symfony Process to follow your lead! > > About the RFC, I just have one note: > > > I didn't include strnatcasecmp() and natcasesort() in this RFC, because > they also use isdigit() and isspace(), and because they are intended for > natural language processing. They could be migrated in future. > > Despite their name, I never used *natcase* functions for natural language > processing. I use them eg to sort lists of files in a directory, to account > for numbers mainly. But that's not what I would call natural language > processing. I'm not aware of anyone using them for that actually. I'm > wondering if it's a good idea to postpone migrating them to an hypothetical > future as to me, the whole reasoning of the RFC applies to them. > > Regards, > Nicolas >