Re: URI canonicalization

Martin Duerst Mon, 31 Jan 2005 20:22:11 -0800


I have just looked at the text in question in -05.txt,
and read through the discussion. I'll give my comments
here, but they are not specifically on this mail.

First, for me, the goal of having reproducible id comparison
is most important; this is the basic requirement.

Second, given that there are an amazing number of things
that can be compared differently once you start (the
URI/IRI specs, the scheme-specific specs, and Unicode
give you a lot of rope), character-
by-character comparison is the only way to go.

Third, I don't think that the normalization advice in
"3.5.1  Dereferencing Identity Constructs" is extremely
important, but I don't mind if it's there.

I have the following actual editing proposals to hopefully
make this part of the spec a bit clearer:

1) switch sections 3.5.1 and 3.5.2 to make clear that
   comparison of IDs is the most important operation.

2) At the start of "Comparing Identity Constructs", change the
   sentence "Instances of Identity constructs can be compared to
   determine whether an entry or feed is the same as one seen before."
   to "To determine whether an entry or feed is the same as one
   seen before, their Identity Constructs are compared."
   This makes it clear that we are talking about "here is how
   you do it", rather than "here's one way to do it".

3) Change "Processors MUST compare Identity constructs on a
   character-by-character basis in a case-sensitive fashion."
   to "Processors MUST compare Identity constructs on a
   character-by-character basis. For details, see section
   5.3.1.,  Simple String Comparison, of [RFC3987].
   We may want to add something about case-sensitivity as
   a note, but it should not be in the main text. There are
   way to many other ways that this could go wrong, in particular
   in an Unicode context.

4) Add a sentence saying something like "Feeds or Entries
   are identical if their IDs compare identical.".
   Seems obvious, but isn't stated anywhere.

5) Add a note saying something like "Comparison functions
   provided by many URI classes/implementations make additional
   assumptions about equality that are not true for Identity
   Constructs. Atom processors therefore should use simple
   string functions for comparing Identity Constructs."
   I think such a note could be a good balance to the normalization
   advice.

I understand that in general, we have tried to reduce
implementation advice in the spec as much as possible.
But in my experience, adding such advice or notes is
often a good way to reach better consensus.


Regards,    Martin.

At 02:17 05/01/31, Graham wrote: >This controversial text is still in: > > Because of the risk of confusion between URIs that would be > equivalent if dereferenced, the following normalization strategy is > strongly encouraged when generating Identity constructs: > > o Provide the scheme in lowercase characters. > o Provide the host, if any, in lowercase characters. > o Only perform percent-encoding where it is essential. > o Use uppercase A-through-F characters when percent-encoding. > o Prevent dot-segments appearing in paths. > o For schemes that define a default authority, use an empty > authority if the default is desired. > o For schemes that define an empty path to be equivalent to a path > of "/", use "/". > o For schemes that define a port, use an empty port if the default > is desired. > o Preserve empty fragment identifiers and queries. > o Ensure that all portions of the URI are UTF-8 encoded NFC form > Unicode strings. > >For starters its in the "Dereferecing" section for some reason. Secondly, no consensus was reached to include it. Tim shrugged when no proposal gained consensus, and included it anyway. Thirdly, the rationale in the spec doesn't match any of the even-vaguely-valid ones given on the list. Fourthly, none of those rationales were valid. Fifthly, it's micromanaging. Of all the things we could go into great detail telling people how to do, this doesn't even rate. I've never seen a feed that has any of the problems this might solve. > >Please delete it. > >Graham > >

Re: URI canonicalization

Reply via email to