Re: Change a file name - remove a consistent string recursively

Les Kitchen via luv-main Mon, 16 Jan 2023 05:03:11 -0800

On Mon, Jan 16, 2023, at 21:42, Craig Sanders via luv-main wrote:
> On Fri, Jan 13, 2023 at 10:39:02PM +1100, Les Kitchen wrote:
>> I'd do something like:
>>
>> find /Dir1 -type f | perl -lne '$o=$_; s/\.junk\././; print("mv -i $o $_") 
>> if $_ ne $o;'


Thanks, Craig, for your followup.

> This is quite dangerous, for several reasons.  To start with, there's no
> protection against renaming files over existing files with the same target
> name.
...

Well, that's the intention of the -i (interactive) option to mv,
to require user agreement before over-writing existing files.

All the other points you raise are valid, especially the dangers
of feeding unchecked input into the shell, and anybody writing
shell code needs to be aware of them — although I will say I
mentioned pretty much all of them in the notes further down in
my message, though in less detail than you have, and without
stressing enough the dangers.

If you know you've got a modest number of files, with
well-behaved file names (in the Unix-shell sense, that is, no
whitespace, no shell metacharacters, etc.), no changes to the
directory structure, then the approach I suggested can work
quite well.  And one big advantage is that you can review the
generated list of shell commands, and check that they'll do what
you expect before committing to executing them.

And, yes, if you have filenames with arbitrary characters, then
you have to resort to other contrivances, ultimately to
NULL-terminated strings.  And, yes, if you have a huge number of
files, then you'd likely want to do the rename internal to your
scripting language, instead of forking a new process for each
file rename.  But then you lose the easy ability to review the
commands before they're executed.  Of course there are
workarounds.  You could define, say, a Perl routine that in one
definition just prints the names in the file renaming, and in
another definition, actually does the rename  But then you're
getting well beyond the simple one-liner (unless you use that
Perl rename program you mention later).

And I could also mention the potential for filenames to contain
UTF-8 (or other encodings) for characters that just happen to
look like ASCII characters, but aren't, or to contain
terminal-control escape sequences.  It can get very weird.

In general, there's a big big difference between a simple shell
one-liner that you use as a work amplifier in situations you
know are well-behaved, and a piece of robust code that can
behave gracefully no matter what weird input is thrown at it.
They're different use-cases.

> It also doesn't distinguish between .junk. in a directory name vs in a file
> name - it will just modify the first instance of ".junk." it sees in the
> pathname. e.g. "./Dir1/My.junk.dir/my.junk.file.txt".  Probably not a problem
> in practice, but something to be aware of.

Yeah.

> Worse, it will break if any filenames contain whitespace characters (newlines,
> tabs, spaces, etc - all of which are completely valid in filenames - the ONLY
> characters guaranteed NOT to be in a pathname are / and NUL).

This should be taped to the screen of every shell user.
Actually, files with spaces are pretty common in non-Unix
environments, like Windows or MacOS (yes, I know it's Unix
underneath), but they're pretty simple to handle by
double-quoting, as I mentioned in my notes — and that will
handle pretty much everything except for characters that
interpolate into double-quoted strings, I guess $ ` (backtick),
and possibly !.

> And because you're NOT quoting the filenames in your print statement, it
> will also break if any filenames contains shell metacharacters like ; & > <
> etc when the output is piped into sh. A simple fix might appear to be to use
> single-quotes in the print statement - e.g. print("mv -i '$o' '$_'") - but
> even this will break if a filename contains a single-quote character. Similar
> for escaped double-quotes.

It's even messier than this.  Because you're already using
single quotes for the -e expression to Perl, you can't
immediately use them like that.  You have to do something like
'"'"' to close the single-quoted string, then attach a
double-quoted single quote, then open a new single-quoted
string.  I don't even want to think about it.  By then you might
as well shift to using NULL-terminated strings.

> Shell can be very dangerous if you don't quote your arguments properly.
> Consider, for example, what would happen if there happened to be a file called
> ";rm --no-preserve-root -rf /;" (or ";sudo rm ....;") under /Dir1.  That's
> a fairly extreme example of an obviously malicious filename, but there are
> plenty of legitimate, seemingly innocuous filenames that WILL cause problems
> if passed unquoted to the shell.
>
> Whitespace and quoting issues in shell are well-known and long-standing,
> and pretty much inherent to the way the shell parses its command line - the
> subject of many FAQs and security advisories.
>
> It's unfortunately very easy to improperly quote filenames - it's far harder
> to do correctly and 100% safely than it seems at first glance.
>
> For safety, if you were to DIY it with a command like yours above (there are
> far better alternatives), you should use -print0 with find and the -0 option
> with perl.
>
> In fact, you should use NUL as the separator with ANY program dealing with
> arbitrary filenames on stdin - most standard tools these days have -0 (or
> -z or -Z) options for using NUL as the separator, including most of GNU
> coreutils etc (head, tail, cut, sort, grep, sed, etc. For awk, you can use
> BEGIN {RS="\0"} or similar).

I'm in full agreement with all this, except that life's much
easier if you know for sure that you're working only with
well-behaved files — which is often the case — then you can use
most of the standard utilities in their simple forms.

And I guess if you're worried, it's pretty straightforward to
write (say) a find command that will go through some directory
structure you plan to work on, and complain if it contains any
ill-behaved file names.  Alternatively, you could use the -path
or -regex options to find to match only on well-behaved
file-names for processing, or put a regular-expression match
into the Perl expression.  And "well–behaved" depends on what
possibilities you want to deal with.

And the whole point of this approach is that you generate the
list of shell commands, and then review them before committing
to execution.  It's for possibly a few hundred files, not
millions.  If there are cases not handled by your relatively
simple code, you should see what went wrong.  If there are just
a few cases your simple code can't handle, then maybe you tweak
your code, or maybe you just deal with those cases manually.
It's a matter of writing simple (one-liner) code that can handle
enough of the cases that there's little or nothing to do
manually.

> 1. perl has a built-in rename function, there's no need to fork mv (which
> would be extremely slow if there are lots of files to rename).  And perl
> isn't shell, so doesn't have problems with unquoted whitespace or shell
> metacharacters in the filenames.  Still doesn't protect against clobbering
> existing filenames without some extra code, though:

Good point, as I mentioned above, if you have a huge number of
files to rename.

...

> 2. Even better, a perl rename utility (aka file-rename, perl-rename, prename,
> etc as mentioned in my previous message in this thread) already exists and
> won't overwrite existing files unless you force it to with the -f option.
>
> It also distinguishes between directories and file names (by default, it will
> only rename the filename portion of a pathname unless you use the --path or
> --fullpath option).  It can take filenames from stdin (and has a -0 option for
> NUL-separated filenames) or as command-line args (e.g. with 'find ... -exec
> rename .... {} +')

Yeah, that Perl rename utility is worth knowing about.  I've
used it from time to time.

Really, though, my main point is that instead of writing code to
do something immediately (which might be dangerous), you can
write code to generate a simple list of shell commands, which
you can review before executing (paged say through less if you
have more than a screenful).  And that's a tactic that applies
more generally than just to file renaming.  Of course, it's
viable only for a modest number of files with reasonably well
behaved names, but that's a common enough scenario for the
tactic to be a useful one to have in your mental toolkit.


— Smiles, Les.
_______________________________________________
luv-main mailing list -- luv-main@luv.asn.au
To unsubscribe send an email to luv-main-le...@luv.asn.au

Re: Change a file name - remove a consistent string recursively

Reply via email to