On Monday, January 31, 2005, at 10:57 PM, Roy T. Fielding wrote:
While we certainly shouldn't be in the business of saying "IDs MUST be compared using the C function 'strcmp'", I think we need to be specific about what it means for an IDs to be unique. Given that we're using URIs as IDs, and two URIs can be functionally identical without being characterwise identical, we need to explain that we mean by "unique". If we don't want to mandate a particular algorithm, then we need to find a way to get the point across some other way. Perhaps (the first sentence is the existing first sentence):There is no reason to require any particular comparison algorithm. One application is going to compare them the same way every time. Two different applications may reach different conclusions about two equivalent identifiers, but nobody cares because AT WORST the result is a bit of inefficient use of storage.
The guidance, if any, should simply state that identifier constructs must be unique. It is not our responsibility to prevent people from assigning the same (equivalent) identifiers to two different resources, nor do I care how many errors occur when they violate such a basic requirement.
Instances of Identity constructs can be compared to determine whether an entry or feed is the same as one seen before. The values of two Identity constructs are considered to be the same if a case-sensitive character-by-character comparison would recognize them as identical.
This language doesn't mandate actually performing a "case-sensitive character-by-character comparison", but it should be clear from the this language what it means to be the same. If we want to get even further from talking about algorithms, we could go with something like this, but it begins to sound a little strained:
Instances of Identity constructs can be compared to determine whether
an entry or feed is the same as one seen before. The values of two
Identity constructs are considered to be the same if they contain exactly
the same code points in exactly the same order.
If we want to get even more precise, and don't mind getting more wordy:
Instances of Identity constructs can be compared to determine whether an entry or feed is the same as one seen before. The values of two Identity constructs are considered to be the same if, after XML deserialization, but before any other processing such as decoding of percent-encoded characters, they contain exactly the same code points in exactly the same order.