On 2/4/11 Fri  Feb 4, 2011  8:02 AM, "Harry Putnam" <rea...@newsguy.com>
scribbled:

> 
> One further question.  In your formulation shown below:
> ,---- 
> |     unless($filename =~ m(.+\.(bmp|gif|jpg|png|psd|tga|tif)$))
> |     {
> |         print STDERR "The filename {$filename} has an unsupported
> | extension. Skipping...";
> |         next;
> |     }
> `----
> 
> I see that specifying exact possible extensions is good, but don't
> really see what the `m' does there.  I'm not that informed on all
> incantations of perl regex but does that not anticipate filenames
> using multiple lines?

The leading 'm' is "match" operator. It is optional if you use
forward-slashes as the regex delimiter. Including an explicit 'm' to start
the regex allows you to use any matched set of delimters (e.g. '()', '{}'
'[]', '||', etc.). Since parentheses are used in the above regex, the 'm' is
required here.

See 'perldoc perlop' and search for "Regexp Quote-Like Operators".

> 
> My take on `m' (from perlre) is that it basically replaces the meaning
> of ^ and $ from the common start of string and end of string, to start
> and end of any line anywhere in multiple lines.

That is correct for a trailing 'm'.

> 
> Further it seems that the use of parens for regex delimiters (at least
> in this instance) is somewhat confusing when its thrown in with at
> least 2 other uses of parens, making 3 different kinds of uses in one
> clause.

You can use other characters to make it more readable. However, all of the
normal delimiters such as {} and [] have other meanings within the regular
expression. Some overlap of meaning is unavoidable, and context-awareness is
required on the part of the reader.

> And finally the `.+' at the start of the regex seems to allow names
> such as $$.psd or ##.psd or %.psd  and the like.  Kind of undoing your
> effort to enforce strict extensions by allowing weird and even
> unusable names on the other end.

I think that the '.+' is not necessary and only requires that at least one
character appear before the period in the file name. The same effect would
be achieved with a single '.' character. Leaving it off entirely would mean
that names such as '.jpg' and '.png' would match.

> 
> Would `\w+' have served better or am I really missing the boat all
> the way round? 
> 

It all depends upon what you want to match and what you want to exclude.
Some file names will not be matched by \w+\.(jpg|png|...)

The only improvement I could suggest (and it is only a small difference in
speed) is to make the grouping parentheses non-capturing:

    m(.\.(?:bmp|gif|jpg|png|psd|tga|tif)$)

You can also use extend syntax and the \z zero-width assertion to make your
regex more readable:

    m{ . \. (?: bmp|gif|jpg|png|psd|tga|tif ) \z }x




-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/


Reply via email to