Thank you all for some first thoughts and clarifying questions.

I'm trying to discard any URL with any character that is not an upper- or lower-case 
letter, digit, or the characters $-_.+!*'(), . I realize that some other characters 
can be 
used in special circumstances, but I don't have to allow for any of these in my 
program. 

I thought that my perl statement:
         if ($url =~ /^[^A-Za-z0-9$-_.+!*'(),]+$/) { #if there are any invalid URL 
characters in the string
                                                     # Remember, special regex 
characters lose their meaning inside []
            print "Invalid character in URL at line $.: $url\n";
            next;
         }
 is saying:
if the variable $url contains any characters not in the set 
[A-Za-z0-9$-_.+!*'(),]+$/), print "Invalid ..."

So, I think I need help in two areas; Do I have my logic backwards because I'm trying 
to match any
character in a variable, and, How do I write the match statement to do what I want.

Thanks, again, for all your help and suggestions.

-Kevin

>>> Wiggins d Anconia <[EMAIL PROTECTED]> 01/09/04 05:01PM >>>


> I'm trying to throw out URLs with any invalid characters in them, like
> '@". According to http://www.ietf.org/rfc/rfc1738.txt :
>    Thus, only alphanumerics, the special characters "$-_.+!*'(),", and
>    reserved characters used for their reserved purposes may be used
>    unencoded within a URL.
> 
> I'd like to throw out a URL like
> 'http://jncicancerspectrum.oupjournals.org/cgi/content/full/jnci;91/3/252' 
> (even though this one works perfectly fine. Go figure.). I've tried:
>         if ($url =~ /^[^A-Za-z0-9$-_.+!*'(),]+$/) { #if there are any
> invalid URL characters in the string
>                                                     # Remember, special
> regex characters lose their meaning inside []
>            print "Invalid character in URL at line $.: $url\n";
>            next;
>         }
> 
> According to my Camel, special regex characters are supposed to lose
> their special functioning inside []. Yet, that obviously isn't true for
> '-' used to separate the start and end of a range. I thought the fourth
> '-' at '$-' was probably indicating a range, so I tried to escape it by
> preceding it with a backslash or '\Q' but both gave strange errors about
> uninitiated strings in concatenations.
> 
> Any suggestions? Thanks for your help and thoughts.
> 

Did you mean to leave out those characters the RFC mentions are reserved
for some schemes, 

"The characters ";", "/", "?", ":", "@", "=" and "&" are the characters
which may be reserved for special meaning within a scheme."

They should be in the class as well, since you are negating it right? 
Just trying to understand completely so I don't throw you off with any
dumb remarks...

http://danconia.org 



--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>


Reply via email to