Hi, Arnt,

For a sleepy guy, you explained things pretty well. Please check below to see if we still have stuff to think about... this round is tagged "Spencer-reply:". But I think we're very close to "good to go".

Thanks,

Spencer

----- Original Message ----- From: "Arnt Gulbrandsen" <[EMAIL PROTECTED]>
To: "Spencer Dawkins" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>; "General Area Review Team" <[email protected]>; "Lisa Dusseault" <[EMAIL PROTECTED]>
Sent: Tuesday, August 15, 2006 5:00 PM
Subject: Re: [Gen-art] Gen-ART Review of draft-newman-i18n-comparator-13.txt


Spencer Dawkins writes:
I was selected as General Area Review Team reviewer for this specification (for background on Gen-ART, please see http://www.alvestrand.no/ietf/gen/art/gen-art-FAQ.html).

This is a re-review, my previous review was for 06, with Scott as shepherding AD, before IETF 65. I'm reading the deltas from 06 (in the spirit of not finding new problems with previously-reviewed text).

Summary: Again, nearly ready for publication as Proposed Standard, with some (new) items that do need to be addressed before publication.

Thanks,

Spencer

Review Comments:

2.2.  Purpose

  Collations abstraction layer for comparison functions so that these
  comparison functions can be used in multiple protocols.

I am just barely able to parse this sentence so that it's not a sentence fragment. I think the problem is that "functions" is being used as a verb and as a noun in the same sentence. I saw later in the document that you had changed "function"-the-noun to "operation", so should be easy to fix. But this isn't an editorial comment, because I'm not sure what the sentence is saying.

It is saying "Arnt cannot search and replace".

   Collations provide a multi-protocol abstraction layer for comparison
functions so that these
   comparison functions can be used in multiple protocols.

(Maybe strike "layer". Not sure  yet. Must look at it when I'm 100% awake.)

Spencer-reply: This works for me, with a slight preference for striking "layer", but either way would work for me.

4.2.2.  Equality
   ...
   In this specification, the return values of the equality test are
   called "match", "no-match" and "undefined".  This is not a
   specification, merely a choice of phrasing.

What does the last sentence mean? (Brian Carpenter asked me, so he doesn't know, either).

It means: I'm not defining what these three return values are called,
only naming them so I can talk about them. If you implement this,
you're free to call them anything you want. You can use a C++ enum
type, or -1/0/1, or whatever.

This rather awkward phrasing is a result of conflicting reviewer requests.

I could say «The return values of the equality test are called "match",
"no-match" and "undefined" in this document.» Would that be clear
enough?

Spencer-Reply: That would work for me - I know Brian needed help, too, and he's the one that needs to be happy. But we're within an RFC Editor note of finished on this.

5.2.  Operations

...

  Although the collation's substring function provides a list of
  matches, a protocol need not provide all that to the client.  It may
  provide only the first matching substring, or even just the
  information that the substring search matched.

Hmmm. I am trying to remember that you're not defining a protocol, only describing what protocols do and don't do, but I'm trying to read this from the application's perspective, and having a hard time understanding how (for example) an application that is trying to display what is matching responds when the protocol only provides an indication that something matched. You may say this is what the protocol developers are supposed to worry about ("if you think applications will want to display what matches, you'd better define the protocol so that this information is returned"), and that's OK.

Exactly.

Some current-day protocols are defined such that 'x contains y' returns
true/false. Servers implementing such protocols should still be able to
use collations.

Spencer-reply: If you stuck "In this way, servers can use collations to support protocols that are defined such that 'x contains y' returns true-false." at the end of the paragraph, I'd "get it" without help.

I'm just struggling a bit here.

Understandably.

6.  Use by Existing Protocols

...

  IMAP [16] also collates, although that is explicit only when the
  COMPARATOR [18] extension is used.  The built-in IMAP substring
  operation and the ordering provided by the SORT [17] extension may
  not meet the requirements made in this document.

  Other protocols may be in a similar position.

  In IMAP, the default collation is i;ascii-casemap, because its
  operations most closely resembles IMAP's built-in operations.

EDITORIAL: I'm guessing that the previous paragraph should be moved up one?

No... the «Other protocols» bit applies to the text above it, not to the
text below it. But I'll change it if one other person agrees with you.

Spencer-reply: Better not change it, because I was confused when I suggested it (this does not augur well for suggested changes). But please see next item.

At the very least, I'm confused because I'm not sure if the top paragraph in this extract describes the differences between i;ascii-casemap and IMAP's built-in operations or is talking about something else.

I'll explain it, but I'm not sure you want to know ;)

In IMAP, the server advertises which extensions it supports. The client
does not advertise anything. Because of this asymmetry, the server
sometimes has to act in a way which satisfies both an unextended and an
extended client. This is such as case: The server must do substring and
sorting operations, and it often cannot tell whether the client knows
about collations.

What's needed is a default collation which is close enough to
current-day IMAP behaviour that unextended clients are not surprised by
what the server does. i;ascii-casemap is that. It may or may not be a
perfect match for IMAP/SORT as specified (Alexey Melnikov and I haven't
found a difference), and it's at least close to what servers typically
implement.

Spencer-reply: Again, your explanation helped a lot and I'd like to hijack it into the document. Something like "In IMAP, the default collation is i;ascii-casemap, since its operations are understood to match IMAP's built-in operations."?

9.1.1.  ASCII Numeric Collation Description

  The "i;ascii-numeric" collation is a simple collation intended for
  use with arbitrary sized unsigned decimal integer numbers stored as
  octet strings.  US-ASCII digits (0x30 to 0x39) represent digits of
  the numbers.  Before converting from string to integer, the input
  string is truncated at the first non-digit character.  All input is
  valid; strings which do not start with a digit represent positive
  infinity.

Is it obvious to everyone except me that leading zeros are ignored? The examples giving a little further down say so - is making this point in examples normative enough?

It's specified in 2244, so I don't think it's very important. This
document merely registers a collation which has been specified for a
decade and implemented in many products.

Spencer-reply: ack.

9.2.1.  ASCII Casemap Collation Description

...

  The i;ascii-casemap collation is well suited to to use with many
  internet protocols and computer languages.  Use with natural language
  is often inappropriate: even though the collation apparently supports
  languages such as Italian and English, in real-world use it tends to
  stumble over words such as "naive", names such as "Llwyd", people and
  place names containing non-ASCII, euro and pound sterling symbols,
  quotation marks, dashes/hyphens, etc.

OK, this may be inadvertantly funny - are "naive" and "Llwyd" supposed to include a non-ascii character, or is that sentence saying something else? (Welcome to the world of the RFC Editor)

I would write naïve if I could. I assume people know that naive and
naïve are both common spellings.

Llwyd is thus spelt. The Welsh consider ll a separate letter and sort it
between l and m.

Spencer-reply: I guess my point was that this was extremely subtle for those of us who don't work with i18n comparison all day long. Perhaps 'Welsh names such as "L1wyd", when the Welsh consider "ll" a separate letter and sort it between "1" and "m"'? But you're going to have to figure out how to get "naïve" into an RFC... Perhaps your AD can step in front of this speeding bullet?

From Spencer: I'm thinking that the "checking the SP SP "1" SP SP string for correctness" also needs to be done pretty soon :-0

Each and every time I run xml2rfc, actually. I grew bored :(

Arnt



_______________________________________________
Gen-art mailing list
[email protected]
https://www1.ietf.org/mailman/listinfo/gen-art

Reply via email to