I have an html file and would like to extract image file names and 
extensions:

my $content = qq|<img src="e:\somedir\otherdir\2_image.jpg">
aölkjd oiae lkajf lksjfkjs df<br><img 
src="http://wlaskjfd.sdlkj/sdlk/LKJ_slkdjf_lkdjfslkj.gif";>|;

Image file names may contain numers, letters or underscores.

my %imageextension;
while ($content =~ /<img src="(.*?)([a-zA-Z0-9_]+)\.(\w{3})">/g) {
        print "yep!\n";
        $imageextension{$2} = $3;}

foreach (keys %imageextension) {
print "$_: $imageextension{$_}\n";}

Why does this code correctly extract "LKJ_slkdjf_lkdjfslkj" for the first 
image, but only "_image", and not "2_image" for the second?

I thought of adding "\\|\/" in the regexp after "(.*?)", but then the first 
image is not extracted at all.

Birgit Kellner

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to