On Apr 9, Jason Larson said:

>  $extension =~ s/(^.+\.?)([^\.]*)$/$2/;
>
>Your regex will fail for a couple of different reasons:
[snip]

Your regex will fail, too.  Here's why... assuming $extension is
"foobar.txt", here is how the regex matches:

  ^       matches the beginning of the string
  .+      matches the entire string
  \.?     matches zero periods (since it can match zero or one)
  [^\.]*  matches zero non-periods
  $       matches the end of the string

So, even though the string DOES have a . in it, your regex fails to match
it.  The File::Basename module is far more thorough.  If, however, you
insist on using a regex instead, try something like this:

  ($extension) = $filename =~ /.*\.(.*)/s;

The .* goes through the ENTIRE string, and then the \. forces the regex to
backup to the last "." in the string; then the (.*) captures everything
after that last ".".  I have added the /s modifier in case the filename
contains newlines (which is not a crime).

Notice that $extension is in parentheses -- this creates a LIST on the
left-hand side of the assignment.  That means the regex is evaluated in
list context.  Thus, should the regex FAIL, it returns an EMPTY list,
which means $extension becomes undef.  A value of undef indicates there
was NO extension, since "foo." has an extension of "".

BUT!

What if $filename is "/foo/bar.blat/gunk"?  Oops.  We get $extension being
"blat/gunk".  That's probably a mistake.  This means we have to restrict
our extension-searching regex to the LAST portion of the path -- the
filename.  But this requires us knowing what the path separator is; Unix
uses /, Windows uses \, Mac uses : I think.  Assuming you have the
character in $SEP, then you can construct a better regex:

  ($extension) = $filename =~ /^(?>(?:.*\Q$SEP\E)?).*\.(.*)/;

This works for me.  The (?>...) part of the regex forces its sub-pattern
to match without allowing it to backtrack.  (As a simpler example, the
regex "aaab" =~ /(?>a+)ab/ will never succeed, because (?>a+) will never
allow itself to backtrack and give up one of the "a"s it matches.)  The
(?>(?:.*\Q$SEP\E)?) part of the regex matches ALL of the filename up to
the last occurrence of its path-separator.  Then the .* matches the rest,
the actual name (and extension) of the file.  The \. requires .* to back
up to the last ".", and then (.*) matches and captures the extension.

There is only ONE more issue to deal with.  What do you do with a filename
of /foo/bar/blat.txt.bak?  Is the extension "txt", "txt.bak", or "bak"?
Here are solutions for all three possibilities:

  # foo.txt.bak => txt
  ($extension) = $filename =~ /^(?>(?:.*\Q$SEP\E)?)[^.]*\.([^.]*)/;

  # foo.txt.bak => txt.bak
  ($extension) = $filename =~ /^(?>(?:.*\Q$SEP\E)?)[^.]*\.(.*)/;

  # foo.txt.bak => bak
  ($extension) = $filename =~ /^(?>(?:.*\Q$SEP\E)?).*\.(.*)/;

Of course, you needn't use a single regex like this.  You could use the
split() function instead.  Using the following to isolate the NAME from
the path:

  $name = (split /\Q$SEP\E/, $filename)[-1];

we have these three solutions:

  # foo.txt.bak => txt
  $extension = (split /\./, $name)[1];

  # foo.txt.bak => txt.bak
  $extension = (split /\./, $name, 2)[1];

  # foo.txt.bak => bak
  $extension = (split /\./, $name)[-1];

Or you could change this second split() to a regex:

  # foo.txt.bak => txt
  ($extension) = $name =~ /\.([^.]*)/;

  # foo.txt.bak => txt.bak
  ($extension) = $name =~ /\.(.*)/;

  # foo.txt.bak => bak
  ($extension) = $name =~ /.*\.([^.]*)/;
  # or
  # ($extension) = $name =~ /\.([^.]*)$/;

NOW THAT I HAVE SHOWN YOU ALL THE WORK YOU HAVE TO DO TO GET IT RIGHT...

Please use File::Basename.  Save us all the headache.

-- 
Jeff "japhy" Pinyan      [EMAIL PROTECTED]      http://www.pobox.com/~japhy/
RPI Acacia brother #734   http://www.perlmonks.org/   http://www.cpan.org/
** Look for "Regular Expressions in Perl" published by Manning, in 2002 **
<stu> what does y/// stand for?  <tenderpuss> why, yansliterate of course.
[  I'm looking for programming work.  If you like my work, let me know.  ]




-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to