Richard Kennard wrote:
Michael,
>> 1) java.net.URL is discouraged... I would agree with Alan on this.
Fair enough: I shall remove those methods.
Can you confirm you want the naming convention changed to Url? It's
just that everything else in the package uses uppercase URL (for
legacy reasons, I'm sure). Note that the class is called
URLEncodedQueryString because it models a 'www-form-urlencoded' query
string, not because of the java.net.URL class.
Url does seem to fit the new conventions better. It is also more
readable in my opinion.
> What if a string to be parsed uses ';' as separator, but contains
'&' chars embedded within it,
> which are not to be interpreted as separators?
When parsing, ALL separators are recognised. So if a string contains a
mix of ';' and '&' both will be recognised. You do not specify the
separator to use at parsing time - only at toString() time.
So, this means that an '&' embedded in a parameter could not be
recognised when parsing, but it would
be recognised if added through one of the add parameter methods (in the
latter case, it would get encoded
into %xy). This sounds wrong to me. I'm not saying we shouldn't allow
the parsing that you describe above,
but just that it should be possible somehow to do a "roundtrip" of
constructing a query piece by piece, outputting
the string, and then parsing the string again later, back into the same
query object.
All that's needed is an additional parse() method which specifies the
separator char.
BTW, I meant to also suggest shortening the ParameterSeparator name to
just Separator.
> Should we have the possibility to specify the character set, perhaps
> in the toString() method? In my experience, in some parts of the
world, particularly Asia,
> other character sets are often used for web applications.
Earlier versions of URIBuilder did this, but either Alan or yourself
thought it complicated matters too much. The HTML spec's
recommendation is UTF-8...
http://www.w3.org/TR/html40/appendix/notes.html#non-ascii-chars
...note that this only applies to URIs - it is a quite separate issue
than what character set is used on the HTML page.
Yes, I suppose that is consistent with the URI spec as well. But the
apidocs should state that UTF-8 is used
in order to avoid any doubt.
Thanks
Michael