Re: [Gen-art] Gen-ART Review of draft-newman-i18n-comparator-13.txt

Spencer Dawkins Tue, 15 Aug 2006 18:43:03 -0700

Hi, Arnt,

For a sleepy guy, you explained things pretty well. Please check below tosee if we still have stuff to think about... this round is tagged"Spencer-reply:". But I think we're very close to "good to go".


Thanks,

Spencer

----- Original Message -----From: "Arnt Gulbrandsen" <[EMAIL PROTECTED]>

To: "Spencer Dawkins" <[EMAIL PROTECTED]>

Cc: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>; "General Area ReviewTeam" <[email protected]>; "Lisa Dusseault" <[EMAIL PROTECTED]>

Sent: Tuesday, August 15, 2006 5:00 PM
Subject: Re: [Gen-art] Gen-ART Review of draft-newman-i18n-comparator-13.txt


Spencer Dawkins writes:

I was selected as General Area Review Team reviewer for thisspecification (for background on Gen-ART, please seehttp://www.alvestrand.no/ietf/gen/art/gen-art-FAQ.html).
This is a re-review, my previous review was for 06, with Scott asshepherding AD, before IETF 65. I'm reading the deltas from 06 (in thespirit of not finding new problems with previously-reviewed text).
Summary: Again, nearly ready for publication as Proposed Standard, withsome (new) items that do need to be addressed before publication.
Thanks,

Spencer

Review Comments:

2.2.  Purpose

  Collations abstraction layer for comparison functions so that these
  comparison functions can be used in multiple protocols.
I am just barely able to parse this sentence so that it's not a sentencefragment. I think the problem is that "functions" is being used as a verband as a noun in the same sentence. I saw later in the document that youhad changed "function"-the-noun to "operation", so should be easy to fix.But this isn't an editorial comment, because I'm not sure what thesentence is saying.


It is saying "Arnt cannot search and replace".

   Collations provide a multi-protocol abstraction layer for comparison
functions so that these
   comparison functions can be used in multiple protocols.

(Maybe strike "layer". Not sure  yet. Must look at it when I'm 100% awake.)

Spencer-reply: This works for me, with a slight preference for striking"layer", but either way would work for me.

4.2.2.  Equality
   ...
   In this specification, the return values of the equality test are
   called "match", "no-match" and "undefined".  This is not a
   specification, merely a choice of phrasing.

What does the last sentence mean? (Brian Carpenter asked me, so he doesn'tknow, either).


It means: I'm not defining what these three return values are called,
only naming them so I can talk about them. If you implement this,
you're free to call them anything you want. You can use a C++ enum
type, or -1/0/1, or whatever.

This rather awkward phrasing is a result of conflicting reviewer requests.

I could say «The return values of the equality test are called "match",
"no-match" and "undefined" in this document.» Would that be clear
enough?

Spencer-Reply: That would work for me - I know Brian needed help, too, andhe's the one that needs to be happy. But we're within an RFC Editor note offinished on this.

5.2.  Operations

...

  Although the collation's substring function provides a list of
  matches, a protocol need not provide all that to the client.  It may
  provide only the first matching substring, or even just the
  information that the substring search matched.
Hmmm. I am trying to remember that you're not defining a protocol, onlydescribing what protocols do and don't do, but I'm trying to read thisfrom the application's perspective, and having a hard time understandinghow (for example) an application that is trying to display what ismatching responds when the protocol only provides an indication thatsomething matched. You may say this is what the protocol developers aresupposed to worry about ("if you think applications will want to displaywhat matches, you'd better define the protocol so that this information isreturned"), and that's OK.


Exactly.

Some current-day protocols are defined such that 'x contains y' returns
true/false. Servers implementing such protocols should still be able to
use collations.

Spencer-reply: If you stuck "In this way, servers can use collations tosupport protocols that are defined such that 'x contains y' returnstrue-false." at the end of the paragraph, I'd "get it" without help.

I'm just struggling a bit here.


Understandably.

6.  Use by Existing Protocols

...

  IMAP [16] also collates, although that is explicit only when the
  COMPARATOR [18] extension is used.  The built-in IMAP substring
  operation and the ordering provided by the SORT [17] extension may
  not meet the requirements made in this document.

  Other protocols may be in a similar position.

  In IMAP, the default collation is i;ascii-casemap, because its
  operations most closely resembles IMAP's built-in operations.

EDITORIAL: I'm guessing that the previous paragraph should be moved upone?


No... the «Other protocols» bit applies to the text above it, not to the
text below it. But I'll change it if one other person agrees with you.

Spencer-reply: Better not change it, because I was confused when I suggestedit (this does not augur well for suggested changes). But please see nextitem.

At the very least, I'm confused because I'm not sure if the top paragraphin this extract describes the differences between i;ascii-casemap andIMAP's built-in operations or is talking about something else.


I'll explain it, but I'm not sure you want to know ;)

In IMAP, the server advertises which extensions it supports. The client
does not advertise anything. Because of this asymmetry, the server
sometimes has to act in a way which satisfies both an unextended and an
extended client. This is such as case: The server must do substring and
sorting operations, and it often cannot tell whether the client knows
about collations.

What's needed is a default collation which is close enough to
current-day IMAP behaviour that unextended clients are not surprised by
what the server does. i;ascii-casemap is that. It may or may not be a
perfect match for IMAP/SORT as specified (Alexey Melnikov and I haven't
found a difference), and it's at least close to what servers typically
implement.

Spencer-reply: Again, your explanation helped a lot and I'd like to hijackit into the document. Something like "In IMAP, the default collation isi;ascii-casemap, since its operations are understood to match IMAP'sbuilt-in operations."?

9.1.1.  ASCII Numeric Collation Description

  The "i;ascii-numeric" collation is a simple collation intended for
  use with arbitrary sized unsigned decimal integer numbers stored as
  octet strings.  US-ASCII digits (0x30 to 0x39) represent digits of
  the numbers.  Before converting from string to integer, the input
  string is truncated at the first non-digit character.  All input is
  valid; strings which do not start with a digit represent positive
  infinity.

Is it obvious to everyone except me that leading zeros are ignored? Theexamples giving a little further down say so - is making this point inexamples normative enough?


It's specified in 2244, so I don't think it's very important. This
document merely registers a collation which has been specified for a
decade and implemented in many products.

Spencer-reply: ack.

9.2.1.  ASCII Casemap Collation Description

...

  The i;ascii-casemap collation is well suited to to use with many
  internet protocols and computer languages.  Use with natural language
  is often inappropriate: even though the collation apparently supports
  languages such as Italian and English, in real-world use it tends to
  stumble over words such as "naive", names such as "Llwyd", people and
  place names containing non-ASCII, euro and pound sterling symbols,
  quotation marks, dashes/hyphens, etc.

OK, this may be inadvertantly funny - are "naive" and "Llwyd" supposed toinclude a non-ascii character, or is that sentence saying something else?(Welcome to the world of the RFC Editor)


I would write naïve if I could. I assume people know that naive and
naïve are both common spellings.

Llwyd is thus spelt. The Welsh consider ll a separate letter and sort it
between l and m.

Spencer-reply: I guess my point was that this was extremely subtle for thoseof us who don't work with i18n comparison all day long. Perhaps 'Welsh namessuch as "L1wyd", when the Welsh consider "ll" a separate letter and sort itbetween "1" and "m"'? But you're going to have to figure out how to get"naïve" into an RFC... Perhaps your AD can step in front of this speedingbullet?

From Spencer: I'm thinking that the "checking the SP SP "1" SP SP stringfor correctness" also needs to be done pretty soon :-0


Each and every time I run xml2rfc, actually. I grew bored :(

Arnt



_______________________________________________
Gen-art mailing list
[email protected]
https://www1.ietf.org/mailman/listinfo/gen-art

Re: [Gen-art] Gen-ART Review of draft-newman-i18n-comparator-13.txt

Reply via email to