I have just looked at the text in question in -05.txt, and read through the discussion. I'll give my comments here, but they are not specifically on this mail.
First, for me, the goal of having reproducible id comparison is most important; this is the basic requirement.
Second, given that there are an amazing number of things that can be compared differently once you start (the URI/IRI specs, the scheme-specific specs, and Unicode give you a lot of rope), character- by-character comparison is the only way to go.
Third, I don't think that the normalization advice in "3.5.1 Dereferencing Identity Constructs" is extremely important, but I don't mind if it's there.
I have the following actual editing proposals to hopefully make this part of the spec a bit clearer:
1) switch sections 3.5.1 and 3.5.2 to make clear that comparison of IDs is the most important operation.
2) At the start of "Comparing Identity Constructs", change the sentence "Instances of Identity constructs can be compared to determine whether an entry or feed is the same as one seen before." to "To determine whether an entry or feed is the same as one seen before, their Identity Constructs are compared." This makes it clear that we are talking about "here is how you do it", rather than "here's one way to do it".
3) Change "Processors MUST compare Identity constructs on a character-by-character basis in a case-sensitive fashion." to "Processors MUST compare Identity constructs on a character-by-character basis. For details, see section 5.3.1., Simple String Comparison, of [RFC3987]. We may want to add something about case-sensitivity as a note, but it should not be in the main text. There are way to many other ways that this could go wrong, in particular in an Unicode context.
4) Add a sentence saying something like "Feeds or Entries are identical if their IDs compare identical.". Seems obvious, but isn't stated anywhere.
5) Add a note saying something like "Comparison functions provided by many URI classes/implementations make additional assumptions about equality that are not true for Identity Constructs. Atom processors therefore should use simple string functions for comparing Identity Constructs." I think such a note could be a good balance to the normalization advice.
I understand that in general, we have tried to reduce implementation advice in the spec as much as possible. But in my experience, adding such advice or notes is often a good way to reach better consensus.
Regards, Martin.
At 02:17 05/01/31, Graham wrote:
>This controversial text is still in:
>
> Because of the risk of confusion between URIs that would be
> equivalent if dereferenced, the following normalization strategy is
> strongly encouraged when generating Identity constructs:
>
> o Provide the scheme in lowercase characters.
> o Provide the host, if any, in lowercase characters.
> o Only perform percent-encoding where it is essential.
> o Use uppercase A-through-F characters when percent-encoding.
> o Prevent dot-segments appearing in paths.
> o For schemes that define a default authority, use an empty
> authority if the default is desired.
> o For schemes that define an empty path to be equivalent to a path
> of "/", use "/".
> o For schemes that define a port, use an empty port if the default
> is desired.
> o Preserve empty fragment identifiers and queries.
> o Ensure that all portions of the URI are UTF-8 encoded NFC form
> Unicode strings.
>
>For starters its in the "Dereferecing" section for some reason. Secondly, no consensus was reached to include it. Tim shrugged when no proposal gained consensus, and included it anyway. Thirdly, the rationale in the spec doesn't match any of the even-vaguely-valid ones given on the list. Fourthly, none of those rationales were valid. Fifthly, it's micromanaging. Of all the things we could go into great detail telling people how to do, this doesn't even rate. I've never seen a feed that has any of the problems this might solve.
>
>Please delete it.
>
>Graham
>
>