DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT <http://issues.apache.org/bugzilla/show_bug.cgi?id=27802>. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE.
http://issues.apache.org/bugzilla/show_bug.cgi?id=27802 EncodeURLTransformer encodes off site links ------- Additional Comments From [EMAIL PROTECTED] 2004-03-26 14:32 ------- Updated EncodeURLTransformer and ElementAttributeURLMatcher to support include patterns for URL's. Default include-url pattern is ".*"; Default exclude-url pattern is "http:.*|https:.*|ftp:.*|#.*|mailto:.*|news:.*|" + "nntp:.*|telnet:.*|prospero:.*|z39.50s:.*|z39.50r:.*|" + "cid:.*|mid:.*|vemmi:.*|service:.*|imap:.*|nfs:.*|" + "acap:.*|rtsp:.*|tip:.*|pop:.*|data:.*|dav:.*|gopher:.*|" + "opaquelocktoken:.*|sip:.*|sips:.*|tel:.*|fax:.*|" + "modem:.*|ldap:.*|soap.beep:.*|soap.beeps:.*|afs:.*|" + "xmlrpc.beep:.*|xmlrpc.beeps:.*|urn:.*|go:.*|h323:.*|" + "ipp:.*|tftp:.*|mupdate:.*|pres:.*|im:.*|wais:.*|" + "file:.*|tn3270:.*|mailserver:.*"; and matches all URLs from IANA registry http://www.iana.org/assignments/uri-schemes Sitemap usage: <map:transformer logger="sitemap.transformer.encodeURL" name="encodeURL" src="org.apache.cocoon.transformation.EncodeURLTransformer"> <exclude-url>http:.*|#.*|myprotocol.*</exclude-url> <include-url>.*</include-url> </map:transformer> The main default behavioural change is that EncodeURLTransformer will only rewrite relative URL's, like "foo/bar/index.xml". It will not rewrite fully qualified URL's starting by an IANA registered protocol, nor document fragment URL's, like "#some-reference". I am not sure how useful the <include-url> pattern match is, it is merely for completeness. What is it useful for? Well, I have to support many legacy html documents, which have been published and must not be altered. These documents may contain links to remote resources. If these links are URLEncoded they stop working, because the remote side issues an 404 Error, document not found. Example: http://www.cnn.com/;jsessionid=35kjsjkj54kslfjdlkj6l5j6lsjf A probable work around would be to transform such links to some private namespace prior to URL encoding and transform them back to href's after URL Encoding. At least one guy had the same issues: http://marc.theaimsgroup.com/?l=xml-cocoon-users&m=107416883114549&w=2 What do you think?
