According to Jessica Biola: > Is there a way to match spaces in a regex inside a > url_rewrite_rules parameter, so that you could just > do: > > url_rewrite_rules: (.*)[:space:](.*) \1%20\2 > > (of course, you'd have to repeat this same rule > multiple times to handle multiple spaces) I tried the > above rule and it didn't seem to work. Characters > inside the [brackets] were taken literally, and thus, > the first s, p, a, c, or e were replaced with %20. > > This may seem like a wimpy work-around, but it could > be done without the need to modify any code > internally, keeping htdig RFC2396 compliant at the > same time. > > So if you could help me with the regex I would > appreciate it.
Interesting idea, but there are a few reasons it won't work: 1) As you discovered, the [:space:] character class isn't implemented. This may actually be a function of which regex code ends up being used. Some C libraries may implement this, but clearly that's not the case on your system. Even if your regex code does implement this, see point 3. 2) You can't use just a space in the regular expression, either with or without the brackets, because url_rewrite_rules is parsed as a string list, not a quoted string list, so there's no way to embed a literal space in your regular expression. 3) Even if you could get around the two problems above, it still wouldn't work because the URL class doesn't do the rewriting until AFTER it's parsed the URL, and so the spaces are already stripped out in accordance with RFC2396. By the way, any trick you'd use to make htdig handle spaces within URLs would be a violation of RFC2396, regardless of whether it required code changes or just config file changes. The standard says spaces should be stripped out. The way most web browsers handle spaces within URLs is also a violation of RFC2396. The question is whether/how we get htdig to do likewise. The change I had suggested previously, which Joe Jah wrote into a patch mostly does things correctly. Only one bit is missing. All white space characters other than the space itself are stripped out anywhere, and the chop() call strips off trailing spaces, but there's nothing in that patch to strip off leading spaces, which is what caused grief in Joe's test of his patch. What you could do is, in addition to Joe's patch, add the following at the very start of URL::URL(char *ref, URL &parent)... while (*ref == ' ') ref++; and this at the very start of URL::parse(char *u)... while (*u == ' ') u++; before ref or u is assigned to the String "temp". -- Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 _______________________________________________ htdig-dev mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/htdig-dev