Thank you all for some first thoughts and clarifying questions. I'm trying to discard any URL with any character that is not an upper- or lower-case letter, digit, or the characters $-_.+!*'(), . I realize that some other characters can be used in special circumstances, but I don't have to allow for any of these in my program.
I thought that my perl statement: if ($url =~ /^[^A-Za-z0-9$-_.+!*'(),]+$/) { #if there are any invalid URL characters in the string # Remember, special regex characters lose their meaning inside [] print "Invalid character in URL at line $.: $url\n"; next; } is saying: if the variable $url contains any characters not in the set [A-Za-z0-9$-_.+!*'(),]+$/), print "Invalid ..." So, I think I need help in two areas; Do I have my logic backwards because I'm trying to match any character in a variable, and, How do I write the match statement to do what I want. Thanks, again, for all your help and suggestions. -Kevin >>> Wiggins d Anconia <[EMAIL PROTECTED]> 01/09/04 05:01PM >>> > I'm trying to throw out URLs with any invalid characters in them, like > '@". According to http://www.ietf.org/rfc/rfc1738.txt : > Thus, only alphanumerics, the special characters "$-_.+!*'(),", and > reserved characters used for their reserved purposes may be used > unencoded within a URL. > > I'd like to throw out a URL like > 'http://jncicancerspectrum.oupjournals.org/cgi/content/full/jnci;91/3/252' > (even though this one works perfectly fine. Go figure.). I've tried: > if ($url =~ /^[^A-Za-z0-9$-_.+!*'(),]+$/) { #if there are any > invalid URL characters in the string > # Remember, special > regex characters lose their meaning inside [] > print "Invalid character in URL at line $.: $url\n"; > next; > } > > According to my Camel, special regex characters are supposed to lose > their special functioning inside []. Yet, that obviously isn't true for > '-' used to separate the start and end of a range. I thought the fourth > '-' at '$-' was probably indicating a range, so I tried to escape it by > preceding it with a backslash or '\Q' but both gave strange errors about > uninitiated strings in concatenations. > > Any suggestions? Thanks for your help and thoughts. > Did you mean to leave out those characters the RFC mentions are reserved for some schemes, "The characters ";", "/", "?", ":", "@", "=" and "&" are the characters which may be reserved for special meaning within a scheme." They should be in the class as well, since you are negating it right? Just trying to understand completely so I don't throw you off with any dumb remarks... http://danconia.org -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>