On Mon, Jan 16, 2023, at 21:42, Craig Sanders via luv-main wrote: > On Fri, Jan 13, 2023 at 10:39:02PM +1100, Les Kitchen wrote: >> I'd do something like: >> >> find /Dir1 -type f | perl -lne '$o=$_; s/\.junk\././; print("mv -i $o $_") >> if $_ ne $o;'
Thanks, Craig, for your followup. > This is quite dangerous, for several reasons. To start with, there's no > protection against renaming files over existing files with the same target > name. ... Well, that's the intention of the -i (interactive) option to mv, to require user agreement before over-writing existing files. All the other points you raise are valid, especially the dangers of feeding unchecked input into the shell, and anybody writing shell code needs to be aware of them — although I will say I mentioned pretty much all of them in the notes further down in my message, though in less detail than you have, and without stressing enough the dangers. If you know you've got a modest number of files, with well-behaved file names (in the Unix-shell sense, that is, no whitespace, no shell metacharacters, etc.), no changes to the directory structure, then the approach I suggested can work quite well. And one big advantage is that you can review the generated list of shell commands, and check that they'll do what you expect before committing to executing them. And, yes, if you have filenames with arbitrary characters, then you have to resort to other contrivances, ultimately to NULL-terminated strings. And, yes, if you have a huge number of files, then you'd likely want to do the rename internal to your scripting language, instead of forking a new process for each file rename. But then you lose the easy ability to review the commands before they're executed. Of course there are workarounds. You could define, say, a Perl routine that in one definition just prints the names in the file renaming, and in another definition, actually does the rename But then you're getting well beyond the simple one-liner (unless you use that Perl rename program you mention later). And I could also mention the potential for filenames to contain UTF-8 (or other encodings) for characters that just happen to look like ASCII characters, but aren't, or to contain terminal-control escape sequences. It can get very weird. In general, there's a big big difference between a simple shell one-liner that you use as a work amplifier in situations you know are well-behaved, and a piece of robust code that can behave gracefully no matter what weird input is thrown at it. They're different use-cases. > It also doesn't distinguish between .junk. in a directory name vs in a file > name - it will just modify the first instance of ".junk." it sees in the > pathname. e.g. "./Dir1/My.junk.dir/my.junk.file.txt". Probably not a problem > in practice, but something to be aware of. Yeah. > Worse, it will break if any filenames contain whitespace characters (newlines, > tabs, spaces, etc - all of which are completely valid in filenames - the ONLY > characters guaranteed NOT to be in a pathname are / and NUL). This should be taped to the screen of every shell user. Actually, files with spaces are pretty common in non-Unix environments, like Windows or MacOS (yes, I know it's Unix underneath), but they're pretty simple to handle by double-quoting, as I mentioned in my notes — and that will handle pretty much everything except for characters that interpolate into double-quoted strings, I guess $ ` (backtick), and possibly !. > And because you're NOT quoting the filenames in your print statement, it > will also break if any filenames contains shell metacharacters like ; & > < > etc when the output is piped into sh. A simple fix might appear to be to use > single-quotes in the print statement - e.g. print("mv -i '$o' '$_'") - but > even this will break if a filename contains a single-quote character. Similar > for escaped double-quotes. It's even messier than this. Because you're already using single quotes for the -e expression to Perl, you can't immediately use them like that. You have to do something like '"'"' to close the single-quoted string, then attach a double-quoted single quote, then open a new single-quoted string. I don't even want to think about it. By then you might as well shift to using NULL-terminated strings. > Shell can be very dangerous if you don't quote your arguments properly. > Consider, for example, what would happen if there happened to be a file called > ";rm --no-preserve-root -rf /;" (or ";sudo rm ....;") under /Dir1. That's > a fairly extreme example of an obviously malicious filename, but there are > plenty of legitimate, seemingly innocuous filenames that WILL cause problems > if passed unquoted to the shell. > > Whitespace and quoting issues in shell are well-known and long-standing, > and pretty much inherent to the way the shell parses its command line - the > subject of many FAQs and security advisories. > > It's unfortunately very easy to improperly quote filenames - it's far harder > to do correctly and 100% safely than it seems at first glance. > > For safety, if you were to DIY it with a command like yours above (there are > far better alternatives), you should use -print0 with find and the -0 option > with perl. > > In fact, you should use NUL as the separator with ANY program dealing with > arbitrary filenames on stdin - most standard tools these days have -0 (or > -z or -Z) options for using NUL as the separator, including most of GNU > coreutils etc (head, tail, cut, sort, grep, sed, etc. For awk, you can use > BEGIN {RS="\0"} or similar). I'm in full agreement with all this, except that life's much easier if you know for sure that you're working only with well-behaved files — which is often the case — then you can use most of the standard utilities in their simple forms. And I guess if you're worried, it's pretty straightforward to write (say) a find command that will go through some directory structure you plan to work on, and complain if it contains any ill-behaved file names. Alternatively, you could use the -path or -regex options to find to match only on well-behaved file-names for processing, or put a regular-expression match into the Perl expression. And "well–behaved" depends on what possibilities you want to deal with. And the whole point of this approach is that you generate the list of shell commands, and then review them before committing to execution. It's for possibly a few hundred files, not millions. If there are cases not handled by your relatively simple code, you should see what went wrong. If there are just a few cases your simple code can't handle, then maybe you tweak your code, or maybe you just deal with those cases manually. It's a matter of writing simple (one-liner) code that can handle enough of the cases that there's little or nothing to do manually. > 1. perl has a built-in rename function, there's no need to fork mv (which > would be extremely slow if there are lots of files to rename). And perl > isn't shell, so doesn't have problems with unquoted whitespace or shell > metacharacters in the filenames. Still doesn't protect against clobbering > existing filenames without some extra code, though: Good point, as I mentioned above, if you have a huge number of files to rename. ... > 2. Even better, a perl rename utility (aka file-rename, perl-rename, prename, > etc as mentioned in my previous message in this thread) already exists and > won't overwrite existing files unless you force it to with the -f option. > > It also distinguishes between directories and file names (by default, it will > only rename the filename portion of a pathname unless you use the --path or > --fullpath option). It can take filenames from stdin (and has a -0 option for > NUL-separated filenames) or as command-line args (e.g. with 'find ... -exec > rename .... {} +') Yeah, that Perl rename utility is worth knowing about. I've used it from time to time. Really, though, my main point is that instead of writing code to do something immediately (which might be dangerous), you can write code to generate a simple list of shell commands, which you can review before executing (paged say through less if you have more than a screenful). And that's a tactic that applies more generally than just to file renaming. Of course, it's viable only for a modest number of files with reasonably well behaved names, but that's a common enough scenario for the tactic to be a useful one to have in your mental toolkit. — Smiles, Les. _______________________________________________ luv-main mailing list -- luv-main@luv.asn.au To unsubscribe send an email to luv-main-le...@luv.asn.au