Re: [Gen-art] Gen-ART Review of draft-newman-i18n-comparator-13.txt

Arnt Gulbrandsen Tue, 15 Aug 2006 15:49:26 -0700

Spencer Dawkins writes:

I was selected as General Area Review Team reviewer for this
specification (for background on Gen-ART, please see
http://www.alvestrand.no/ietf/gen/art/gen-art-FAQ.html).


This is a re-review, my previous review was for 06, with Scott as
shepherding AD, before IETF 65. I'm  reading the deltas from 06 (in
the spirit of not finding new problems with previously-reviewed
text).

Summary: Again, nearly ready for publication as Proposed Standard,
with some (new) items that do need to be addressed before
publication.

Thanks,

Spencer

Review Comments:

2.2.  Purpose

  Collations abstraction layer for comparison functions so that these
  comparison functions can be used in multiple protocols.

I am just barely able to parse this sentence so that it's not a
sentence fragment. I think the problem is that "functions" is being
used as a verb and as a noun in the same sentence. I saw later in the
document that you had changed "function"-the-noun to "operation", so
should be easy to fix. But this isn't an editorial comment, because
I'm not sure what the sentence is saying.


It is saying "Arnt cannot search and replace".

   Collations provide a multi-protocol abstraction layer for comparison
functions so that these
   comparison functions can be used in multiple protocols.

(Maybe strike "layer". Not sure  yet. Must look at it when I'm 100% awake.)

4.2.2.  Equality
   ...
   In this specification, the return values of the equality test are
   called "match", "no-match" and "undefined".  This is not a
   specification, merely a choice of phrasing.

What does the last sentence mean? (Brian Carpenter asked me, so he
doesn't know, either).


It means: I'm not defining what these three return values are called,
only naming them so I can talk about them. If you implement this,
you're free to call them anything you want. You can use a C++ enum
type, or -1/0/1, or whatever.

This rather awkward phrasing is a result of conflicting reviewer requests.

I could say «The return values of the equality test are called "match",
"no-match" and "undefined" in this document.» Would that be clear
enough?

5.2.  Operations

...

  Although the collation's substring function provides a list of
  matches, a protocol need not provide all that to the client.  It may
  provide only the first matching substring, or even just the
  information that the substring search matched.

Hmmm. I am trying to remember that you're not defining a protocol,
only describing what protocols do and don't do, but I'm trying to
read this from the application's perspective, and having a hard time
understanding how (for example) an application that is trying to
display what is matching responds when the protocol only provides an
indication that something matched. You may say this is what the
protocol developers are supposed to worry about ("if you think
applications will want to display what matches, you'd better define
the protocol so that this information is returned"), and that's OK.


Exactly.

Some current-day protocols are defined such that 'x contains y' returns
true/false. Servers implementing such protocols should still be able to
use collations.

I'm just struggling a bit here.


Understandably.

6.  Use by Existing Protocols

...

  IMAP [16] also collates, although that is explicit only when the
  COMPARATOR [18] extension is used.  The built-in IMAP substring
  operation and the ordering provided by the SORT [17] extension may
  not meet the requirements made in this document.

  Other protocols may be in a similar position.

  In IMAP, the default collation is i;ascii-casemap, because its
  operations most closely resembles IMAP's built-in operations.

EDITORIAL: I'm guessing that the previous paragraph should be moved up one?


No... the «Other protocols» bit applies to the text above it, not to the
text below it. But I'll change it if one other person agrees with you.

At the very least, I'm confused because I'm not sure if the top
paragraph in this extract describes the differences between
i;ascii-casemap and IMAP's built-in operations or is talking about
something else.


I'll explain it, but I'm not sure you want to know ;)

In IMAP, the server advertises which extensions it supports. The client
does not advertise anything. Because of this asymmetry, the server
sometimes has to act in a way which satisfies both an unextended and an
extended client. This is such as case: The server must do substring and
sorting operations, and it often cannot tell whether the client knows
about collations.

What's needed is a default collation which is close enough to
current-day IMAP behaviour that unextended clients are not surprised by
what the server does. i;ascii-casemap is that. It may or may not be a
perfect match for IMAP/SORT as specified (Alexey Melnikov and I haven't
found a difference), and it's at least close to what servers typically
implement.

9.1.1.  ASCII Numeric Collation Description

  The "i;ascii-numeric" collation is a simple collation intended for
  use with arbitrary sized unsigned decimal integer numbers stored as
  octet strings.  US-ASCII digits (0x30 to 0x39) represent digits of
  the numbers.  Before converting from string to integer, the input
  string is truncated at the first non-digit character.  All input is
  valid; strings which do not start with a digit represent positive
  infinity.

Is it obvious to everyone except me that leading zeros are ignored?
The examples giving a little further down say so - is making this
point in examples normative enough?


It's specified in 2244, so I don't think it's very important. This
document merely registers a collation which has been specified for a
decade and implemented in many products.

9.2.1.  ASCII Casemap Collation Description

...

  The i;ascii-casemap collation is well suited to to use with many
  internet protocols and computer languages.  Use with natural language
  is often inappropriate: even though the collation apparently supports
  languages such as Italian and English, in real-world use it tends to
  stumble over words such as "naive", names such as "Llwyd", people and
  place names containing non-ASCII, euro and pound sterling symbols,
  quotation marks, dashes/hyphens, etc.

OK, this may be inadvertantly funny - are "naive" and "Llwyd" supposed
to include a non-ascii character, or is that sentence saying
something else? (Welcome to the world of the RFC Editor)


I would write naïve if I could. I assume people know that naive and
naïve are both common spellings.

Llwyd is thus spelt. The Welsh consider ll a separate letter and sort it
between l and m.

From Spencer: I'm thinking that the "checking the SP SP "1" SP SP
string for correctness" also needs to be done pretty soon :-0


Each and every time I run xml2rfc, actually. I grew bored :(

Arnt

_______________________________________________
Gen-art mailing list
[email protected]
https://www1.ietf.org/mailman/listinfo/gen-art

Re: [Gen-art] Gen-ART Review of draft-newman-i18n-comparator-13.txt

Reply via email to