On 2/4/11 Fri Feb 4, 2011 8:02 AM, "Harry Putnam" <[email protected]>
scribbled:
>
> One further question. In your formulation shown below:
> ,----
> | unless($filename =~ m(.+\.(bmp|gif|jpg|png|psd|tga|tif)$))
> | {
> | print STDERR "The filename {$filename} has an unsupported
> | extension. Skipping...";
> | next;
> | }
> `----
>
> I see that specifying exact possible extensions is good, but don't
> really see what the `m' does there. I'm not that informed on all
> incantations of perl regex but does that not anticipate filenames
> using multiple lines?
The leading 'm' is "match" operator. It is optional if you use
forward-slashes as the regex delimiter. Including an explicit 'm' to start
the regex allows you to use any matched set of delimters (e.g. '()', '{}'
'[]', '||', etc.). Since parentheses are used in the above regex, the 'm' is
required here.
See 'perldoc perlop' and search for "Regexp Quote-Like Operators".
>
> My take on `m' (from perlre) is that it basically replaces the meaning
> of ^ and $ from the common start of string and end of string, to start
> and end of any line anywhere in multiple lines.
That is correct for a trailing 'm'.
>
> Further it seems that the use of parens for regex delimiters (at least
> in this instance) is somewhat confusing when its thrown in with at
> least 2 other uses of parens, making 3 different kinds of uses in one
> clause.
You can use other characters to make it more readable. However, all of the
normal delimiters such as {} and [] have other meanings within the regular
expression. Some overlap of meaning is unavoidable, and context-awareness is
required on the part of the reader.
> And finally the `.+' at the start of the regex seems to allow names
> such as $$.psd or ##.psd or %.psd and the like. Kind of undoing your
> effort to enforce strict extensions by allowing weird and even
> unusable names on the other end.
I think that the '.+' is not necessary and only requires that at least one
character appear before the period in the file name. The same effect would
be achieved with a single '.' character. Leaving it off entirely would mean
that names such as '.jpg' and '.png' would match.
>
> Would `\w+' have served better or am I really missing the boat all
> the way round?
>
It all depends upon what you want to match and what you want to exclude.
Some file names will not be matched by \w+\.(jpg|png|...)
The only improvement I could suggest (and it is only a small difference in
speed) is to make the grouping parentheses non-capturing:
m(.\.(?:bmp|gif|jpg|png|psd|tga|tif)$)
You can also use extend syntax and the \z zero-width assertion to make your
regex more readable:
m{ . \. (?: bmp|gif|jpg|png|psd|tga|tif ) \z }x
--
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
http://learn.perl.org/