I wanted to see if anyone had some feedback or suggestions
with regards to changes to properly encode the href
attribute in some of the NetUI tags, such as anchor,
rewriteURL, image, etc.
Currently we do not perform any encoding on the string
defined for the href attribute of these tags. A developer
provides the href with the properly escaped characters
as needed. However, we do encode the names and values
of the NetUI parameter tag (with URLCodec) for multibyte
and other URI characters that need to be escaped.
To make this more consistent, I plan to change the default
behavior to encode the href attributes of the NetUI tags,
such as anchor, rewriteURL, image, etc.
The implementation I have splits a string into components of
scheme, authority, path, query, and fragment, and then uses
a java.net.URI constructor which will escape the characters
in each component as required.
However, there's a condition that wouldn't be supported with
these implementation changes. It would impact a few of the
tags that use href, like RewriteURL, which do not contain/use
the netui parameter tag.
First, the user has a URL that includes a reserved URI character
such as '&' in a value of a query that you want escaped. For
example, the user wants a final URI that has a string for
"hot&cold" in a query value, like...
"http://host/index.html?query=hot%26cold".
If they use a tag with href="index.html?query=hot&cold" we'd
create "http://host/index.html?query=hot&cold". If they
use href="index.html?query=hot%26cold" we'd return
"http://host/index.html?query=hot%2526cold".
Also, java.net.URI does not support a multibyte character set
encoding as a parameter to its constructors. If the href has
multibyte characters in it, we wouldn't encode and escape
them correctly.
Neither of these are issues for the anchor tag, because
using the parameter tag solves the problem. When we add
an individual parameter to our MutableURI we can encode
it as needed. It's just during construction with an
href that includes multibyte or a special reserved
character, that we have an issue.
One option would be to have our own code to handle the
proper encoding of each of the individual components of
the URI following the generic syntax described in RFC 2396.
(http://www.ietf.org/rfc/rfc2396.txt)
Other thoughts?
Hope the explanation is clear. Thanks,
Carlin