I have an html file and would like to extract image file names and extensions:
my $content = qq|<img src="e:\somedir\otherdir\2_image.jpg"> aölkjd oiae lkajf lksjfkjs df<br><img src="http://wlaskjfd.sdlkj/sdlk/LKJ_slkdjf_lkdjfslkj.gif">|; Image file names may contain numers, letters or underscores. my %imageextension; while ($content =~ /<img src="(.*?)([a-zA-Z0-9_]+)\.(\w{3})">/g) { print "yep!\n"; $imageextension{$2} = $3;} foreach (keys %imageextension) { print "$_: $imageextension{$_}\n";} Why does this code correctly extract "LKJ_slkdjf_lkdjfslkj" for the first image, but only "_image", and not "2_image" for the second? I thought of adding "\\|\/" in the regexp after "(.*?)", but then the first image is not extracted at all. Birgit Kellner -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]