[Gen-art] Gen-ART Review of draft-newman-i18n-comparator-06.txt

Spencer Dawkins Mon, 27 Feb 2006 20:48:07 -0800

I was selected as General Area Review Team reviewer for this specification
(for background on Gen-ART, please see
http://www.alvestrand.no/ietf/gen/art/gen-art-FAQ.html).


Summary:

Review Comments: This document is almost ready for publication as a Proposed
Standard. I have a small number of nittish comments (more than editorial),
but if the authors agree, I believe any of these changes could be RFC Editor
notes. The ones I'd really like to see Brian look closely at are in 3.2,
4.2.1, and 4.2.2.

Thanks,

Spencer

3.2.  Wildcards

Spencer: two minor concerns with the following text:

(1) I'm not sure how the first two sentences work together. Does the first
sentence say "there can only be one wildcard character in the string a
client uses to select a collation", or does "a wildcard" mean something
besides "one wildcard"? The second sentence is my greater confusion, because
I'm reading the first sentence as saying that "aa*aa*" would NOT be OK,
because it has more than one wildcard character, and reading the second
sentence as saying that "aa**aa" would NOT be OK, because it has adjacent
wildcard characters, but it's NOT OK anyway, because it has more than one
wildcard character (whether adjacent or not). Please clue me in.

(2) I would love to see a sentence explaining why the third sentence is
"SHOULD NOT use wildcards" and not "MUST NOT use wildcards". To be honest,
I'm trying to understand why this restriction exists at all (at either
SHOULD NOT or MUST NOT strength), but the absence of SHOULD NOT
qualification doesn't help me with this, and I expect that it would help.
And why is "the server SHOULD select the collation" a SHOULD, and not a
MUST? Mumble.

  The string a client uses to select a collation MAY contain a wildcard
  ("*") character which matches zero or more collation-chars.  Wildcard
  characters MUST NOT be adjacent.  Clients which support disconnected
  operation SHOULD NOT use wildcards to select a collation, but clients
  which provide collation operations only when connected to the server
  MAY use wildcards.  If the wildcard string matches multiple
  collations, the server SHOULD select the collation with the broadest
  scope (preferably international scope), the most recent table
  versions and the greatest number of supported operations.

3.3.  Ordering Direction

Spencer: this is at the edge of a nit, but "collation-order" and
"collation-sel" haven't been introduced previously, and I'm having to guess
that "sel" is short for "selection", or something. Mumble.

  When used as a protocol element for ordering, the collation name MAY
  be prefixed by either "+" or "-" to explicitly specify an ordering
  direction.  As mentioned previously, "+" has no effect on the
  ordering function, while "-" negates the result of the ordering
  function.  In general, collation-order is used when a client requests
  a collation, and collation-sel is used when the server informs the
  client of the selected collation.

4.2.1.  Equality

Spencer: I'm confused here (note the trend :-). Is the following text
saying, "MAY return either "error" or "no-match" if the input strings are
not valid character strings ..."? The current text doesn't seem to say what
happens when the input strings aren't valid and the equality function
doesn't return "error", which is only a MAY strength ("so don't be surprised
when your server does this").

  The equality function always returns "match" or "no-match" when
  supplied valid input, and MAY return "error" if the input strings are
  not valid character strings or violate other collation constraints.

4.2.2.  Substring

Spencer: the following text requiring the ending offset seems inconsistent
with 5.2, which (as I understand it) allows either the ending offset OR the
length to be returned. If they ARE inconsistent, I'd much rather see 4.2.2
prevail, because I don't feel good about telling application developers that
sometimes they may get (10, 15) that means "six characters/octets long" and
other times they may get (10, 15) which means "15 characters/octets long".

  Application protocols MAY return position information for substring
  matches.  If this is done, the position information SHOULD include
  both the starting offset and the ending offset in the string.

4.3.  Internal Canonicalization Algorithm

Spencer: I don't believe that "The output of the canonicalization algorithm
MAY have no meaning to a human" is an upper-case MAY - not a requirement.

  A collation specification MUST describe the internal canonicalization
  algorithm.  This algorithm can be applied to individual strings and
  the result strings can be stored to potentially optimize future
  comparison operations.  A collation MAY specify that the
  canonicalization algorithm is the identity function.  The output of
  the canonicalization algorithm MAY have no meaning to a human.

7.1.  Collation Registration Procedure

Spencer: I'm not trying to change existing practice, but the IESG is having
enough fun reviewing appeals these days that if the appeal track started
with the APPS area directors, I'm sure that the other ADs would be thrilled.
:-(

  The IETF will create a mailing list, [EMAIL PROTECTED], which can be
  used for public discussion of collation proposals prior to
  registration.  Use of the mailing list is encouraged but not
  required.  The actual registration procedure will not begin until the
  completed registration template is sent to [EMAIL PROTECTED]  The IESG
  will appoint a designated expert who will monitor the
  [EMAIL PROTECTED] mailing list and review registrations forwarded
  from IANA.  The designated expert is expected to tell IANA and the
  submitter of the registration within two weeks whether the
  registration is approved, approved with minor changes, or rejected
  with cause.  When a registration is rejected with cause, it can be
  re-submitted if the concerns listed in the cause are addressed.
  Decisions made by the designated expert can be appealed to the IESG
  and subsequently follow the normal appeals procedure for IESG
  decisions.

9.2.1.  ASCII Casemap Collation Description

Spencer: the following text really clarified the text describing ACAP and
Sieve previously - use this sentence in that section as well?

  For historical reasons, in the context of ACAP and Sieve, the name
  "i;ascii-casemap" is a synonym for this collation.

9.5.1.  Octet Collation Description

Spencer: Ouch! is there a less ambiguous naming set than "first string" and
"second string"? I'm almost sure I've also used programming languages that
thought the first string was the search target, so it took me a second to
grok that the second string was the search target. If I'm the only one who
is confused, that's not a problem.

  The substring function returns "match" if the first string is the
  empty string, or if there exists a substring of the second string of
  length equal to the length of the first string which would result in
  a "match" result from the equality function.  Otherwise the substring
  function returns "no-match".



_______________________________________________
Gen-art mailing list
[email protected]
https://www1.ietf.org/mailman/listinfo/gen-art

[Gen-art] Gen-ART Review of draft-newman-i18n-comparator-06.txt

Reply via email to