https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=22223
--- Comment #21 from David Cook <[email protected]> --- Actually, I'm going back to thinking that the "url" filter should be removed in this case. The 2005 URI standard says at https://tools.ietf.org/html/std66#section-2.4 that "the only time when octets within a URI are percent-encoded is during the process of producing the URI from its component parts. This is when an implementation determines which of the reserved characters are to be used as subcomponent delimiters and which can be safely used as data. Once produced, a URI is always in its percent-encoded form." This is the only safe time to do the uri encoding. And if you look at the "uri" filter for Template Toolkit at http://www.template-toolkit.org/docs/manual/Filters.html#section_uri, that's how they use the do percent-encoding for URIs. That is, building a URI from its component parts and doing the escaping at those points. The "url" filter for encoding whole URLs in Template Toolkit is highly problematic. I can certainly get the appeal. After all, say someone submits a URL to a web form and you want to show them their URL on the response page. Technically speaking, a person should decompose the URL, and then rebuild it from its component parts. The "url" filter is a convenient mechanism, but it seems technically incorrect. So we maybe shouldn't use the "url" filter... but we need to do *something*. The 2005 URI standard is dogmatic. Practically speaking, Koha is given whole URLs by library staff members. It's not building URIs itself from component parts. In theory, the library staff members should be passing in URLs that are already encoded, but in practice that is unlikely to happen, unless they're copying/pasting from somewhere else, and even then it may be hit or miss. In theory, we shouldn't be encoding the URL at the template level since it should already be encoded when it was created... but as above... we can't trust that. Perhaps we should implement our own filter that first parses the URI and then decodes its component parts before re-encoding its component parts. Of course, https://tools.ietf.org/html/std66#section-2.4 also says "Implementations must not percent-encode or decode the same string more than once, as decoding an already decoded string might lead to misinterpreting a percent data octet as the beginning of a percent-encoding, or vice versa in the case of percent-encoding an already percent-encoded string." So... technically speaking this is kind of unsolvable in terms of strict adherence to the standard? The problem being of course the human element. If we were mechanically building URLs, encoding them, sending them, decoding them, and using them... it would all be fine. The problem is human input. With the OPAC, we might accept a URL but it would just be as text data. But with the staff interface, we're actually using it in HTML... The most practical option in my mind is to just not use the "url" filter on this field, because I think that we sort of have to assume that the librarian has put in a properly encoded URL in the first place. -- You are receiving this mail because: You are watching all bug changes. _______________________________________________ Koha-bugs mailing list [email protected] https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs website : http://www.koha-community.org/ git : http://git.koha-community.org/ bugs : http://bugs.koha-community.org/
