On 2/4/11 Fri Feb 4, 2011 8:02 AM, "Harry Putnam" <rea...@newsguy.com> scribbled:
> > One further question. In your formulation shown below: > ,---- > | unless($filename =~ m(.+\.(bmp|gif|jpg|png|psd|tga|tif)$)) > | { > | print STDERR "The filename {$filename} has an unsupported > | extension. Skipping..."; > | next; > | } > `---- > > I see that specifying exact possible extensions is good, but don't > really see what the `m' does there. I'm not that informed on all > incantations of perl regex but does that not anticipate filenames > using multiple lines? The leading 'm' is "match" operator. It is optional if you use forward-slashes as the regex delimiter. Including an explicit 'm' to start the regex allows you to use any matched set of delimters (e.g. '()', '{}' '[]', '||', etc.). Since parentheses are used in the above regex, the 'm' is required here. See 'perldoc perlop' and search for "Regexp Quote-Like Operators". > > My take on `m' (from perlre) is that it basically replaces the meaning > of ^ and $ from the common start of string and end of string, to start > and end of any line anywhere in multiple lines. That is correct for a trailing 'm'. > > Further it seems that the use of parens for regex delimiters (at least > in this instance) is somewhat confusing when its thrown in with at > least 2 other uses of parens, making 3 different kinds of uses in one > clause. You can use other characters to make it more readable. However, all of the normal delimiters such as {} and [] have other meanings within the regular expression. Some overlap of meaning is unavoidable, and context-awareness is required on the part of the reader. > And finally the `.+' at the start of the regex seems to allow names > such as $$.psd or ##.psd or %.psd and the like. Kind of undoing your > effort to enforce strict extensions by allowing weird and even > unusable names on the other end. I think that the '.+' is not necessary and only requires that at least one character appear before the period in the file name. The same effect would be achieved with a single '.' character. Leaving it off entirely would mean that names such as '.jpg' and '.png' would match. > > Would `\w+' have served better or am I really missing the boat all > the way round? > It all depends upon what you want to match and what you want to exclude. Some file names will not be matched by \w+\.(jpg|png|...) The only improvement I could suggest (and it is only a small difference in speed) is to make the grouping parentheses non-capturing: m(.\.(?:bmp|gif|jpg|png|psd|tga|tif)$) You can also use extend syntax and the \z zero-width assertion to make your regex more readable: m{ . \. (?: bmp|gif|jpg|png|psd|tga|tif ) \z }x -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/