Re: Request for comments: Bug 6306820

Michael McMahon Tue, 29 May 2007 01:57:47 -0700

Richard Kennard wrote:

Michael,
>> 1) java.net.URL is discouraged... I would agree with Alan on this.

Fair enough: I shall remove those methods.
Can you confirm you want the naming convention changed to Url? It'sjust that everything else in the package uses uppercase URL (forlegacy reasons, I'm sure). Note that the class is calledURLEncodedQueryString because it models a 'www-form-urlencoded' querystring, not because of the java.net.URL class.

Url does seem to fit the new conventions better. It is also morereadable in my opinion.

> What if a string to be parsed uses ';' as separator, but contains'&' chars embedded within it,
> which are not to be interpreted as separators?
When parsing, ALL separators are recognised. So if a string contains amix of ';' and '&' both will be recognised. You do not specify theseparator to use at parsing time - only at toString() time.

So, this means that an '&' embedded in a parameter could not berecognised when parsing, but it wouldbe recognised if added through one of the add parameter methods (in thelatter case, it would get encodedinto %xy). This sounds wrong to me. I'm not saying we shouldn't allowthe parsing that you describe above,but just that it should be possible somehow to do a "roundtrip" ofconstructing a query piece by piece, outputtingthe string, and then parsing the string again later, back into the samequery object.All that's needed is an additional parse() method which specifies theseparator char.

BTW, I meant to also suggest shortening the ParameterSeparator name tojust Separator.

> Should we have the possibility to specify the character set, perhaps
> in the toString() method? In my experience, in some parts of theworld, particularly Asia,
> other character sets are often used for web applications.
Earlier versions of URIBuilder did this, but either Alan or yourselfthought it complicated matters too much. The HTML spec'srecommendation is UTF-8...
   http://www.w3.org/TR/html40/appendix/notes.html#non-ascii-chars
...note that this only applies to URIs - it is a quite separate issuethan what character set is used on the HTML page.

Yes, I suppose that is consistent with the URI spec as well. But theapidocs should state that UTF-8 is used

in order to avoid any doubt.

Thanks
Michael

Re: Request for comments: Bug 6306820

Reply via email to