We decided on the following proposal. dougt, rpotts, chak, dmose, gagan, valeski, and 
nhotta attended the meeting.

URI's would accept, and store, only UTF8 encoded strings. Protocols not able to handle 
UTF8 (HTTP for example), would access the charset attribute (proposed) off of nsIURI 
to convert back to the original string. The charset would be set by the URI creator as 
they have the best charset context. Is nsIURI the right
place for the charset attribute?

The current ASCII % encoding would be removed from the internal URI representation. 
Again, this encoding would be pushed out to the protocol level.

Currently, necko provides the ability to create both UTF8 encoded URIs, as well as 
ASCII URIs. This is a bug that needs to be fixed so *all* necko URI creation 
facilities would create UTF8 URIs.

This proposal addresses LDAP's immediate need for UTF8 URIs (it is a protocol that can 
handle UTF8 strings), as well as HTTP's need to *not* use UTF8 (the charset attribute 
will allow HTTP to convert back to the original string).

IDNS, and future HTTP servers handling UTF8 are believed to be covered under this 
model.

Migration to this new world would be phased something like the following to minimize 
impact...

First phase:
- The URI charset attribute would be added first, and URI creators would start feeding 
in the charset.
- Necko would provide consistency in URI creation facilities (all UTF8), and 
callers/users expecting non-UTF8 URIs would need to deal w/ the new encoding.
- HTTP would covert out of UTF8 before sending requests (fixes chak's bug).

Second phase:
- ASCII % encoding would be removed from the url implementation(s), and pushed out to 
the protocols who need it. Callers expecting the encoding would also need to be 
repaired to handle the new UTF8 format.


Jud



Reply via email to