Eric A. Hall wrote: > One other point, the names which are used in protocol operations (HTTP > headers and responses) can be given specific rules which can be different > from the rules that govern URLs. They can also be converted as required. > > 1) users will enter URLs in the preferred charset, standards be damned > > 2) URLs with IDNs will have to be converted to a canonical encoding > whenever they are processed, regardless of whether they are > provided in this form by software or the user. > > 3) The protocol execution of the URL data is subject to the protocol > rules and not the URL rules. > > This may mean that a URL is entered as is converted > to http://<url> by the client for processing is converted to http://<utf8> > for execution. Similarly, a URL is keyed as mailto:<iso-2022-jp> is > converted to mailto:<url> is converted to mailto:<ace>. > > I would also put the conversion between (1) and (2) at the viewing system, > and not at the generating system. There is no reasonable expectation that > HTML coded in a localized text editor will do this conversion, so the only > place that conversion can occur will be on the rendering system, when the > URL is parsed. Although it would be beneficial if the generating system > does this work first, and this should be encouraged, it is an unreasonable > expectation to assume that they will. >
Thanks for clarifying the overall process. This is beginning to get manageable now. I have a question, though. Why do you say that it would be preferable for the generating systems to do the conversion from (1) to (2) if possible, if we won't be able to assume at later steps that the work has already been done? I would think that the user would prefer to see the unmodified http://<iso-2022-jp> in his text editor (the rest of the page would also be in iso-2022-jp), which would enable him to easily to edit his links without returning to the original generating software. Bruce
