Hi, Arnt,
For a sleepy guy, you explained things pretty well. Please check below to
see if we still have stuff to think about... this round is tagged
"Spencer-reply:". But I think we're very close to "good to go".
Thanks,
Spencer
----- Original Message -----
From: "Arnt Gulbrandsen" <[EMAIL PROTECTED]>
To: "Spencer Dawkins" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>; "General Area Review
Team" <[email protected]>; "Lisa Dusseault" <[EMAIL PROTECTED]>
Sent: Tuesday, August 15, 2006 5:00 PM
Subject: Re: [Gen-art] Gen-ART Review of draft-newman-i18n-comparator-13.txt
Spencer Dawkins writes:
I was selected as General Area Review Team reviewer for this
specification (for background on Gen-ART, please see
http://www.alvestrand.no/ietf/gen/art/gen-art-FAQ.html).
This is a re-review, my previous review was for 06, with Scott as
shepherding AD, before IETF 65. I'm reading the deltas from 06 (in the
spirit of not finding new problems with previously-reviewed text).
Summary: Again, nearly ready for publication as Proposed Standard, with
some (new) items that do need to be addressed before publication.
Thanks,
Spencer
Review Comments:
2.2. Purpose
Collations abstraction layer for comparison functions so that these
comparison functions can be used in multiple protocols.
I am just barely able to parse this sentence so that it's not a sentence
fragment. I think the problem is that "functions" is being used as a verb
and as a noun in the same sentence. I saw later in the document that you
had changed "function"-the-noun to "operation", so should be easy to fix.
But this isn't an editorial comment, because I'm not sure what the
sentence is saying.
It is saying "Arnt cannot search and replace".
Collations provide a multi-protocol abstraction layer for comparison
functions so that these
comparison functions can be used in multiple protocols.
(Maybe strike "layer". Not sure yet. Must look at it when I'm 100% awake.)
Spencer-reply: This works for me, with a slight preference for striking
"layer", but either way would work for me.
4.2.2. Equality
...
In this specification, the return values of the equality test are
called "match", "no-match" and "undefined". This is not a
specification, merely a choice of phrasing.
What does the last sentence mean? (Brian Carpenter asked me, so he doesn't
know, either).
It means: I'm not defining what these three return values are called,
only naming them so I can talk about them. If you implement this,
you're free to call them anything you want. You can use a C++ enum
type, or -1/0/1, or whatever.
This rather awkward phrasing is a result of conflicting reviewer requests.
I could say «The return values of the equality test are called "match",
"no-match" and "undefined" in this document.» Would that be clear
enough?
Spencer-Reply: That would work for me - I know Brian needed help, too, and
he's the one that needs to be happy. But we're within an RFC Editor note of
finished on this.
5.2. Operations
...
Although the collation's substring function provides a list of
matches, a protocol need not provide all that to the client. It may
provide only the first matching substring, or even just the
information that the substring search matched.
Hmmm. I am trying to remember that you're not defining a protocol, only
describing what protocols do and don't do, but I'm trying to read this
from the application's perspective, and having a hard time understanding
how (for example) an application that is trying to display what is
matching responds when the protocol only provides an indication that
something matched. You may say this is what the protocol developers are
supposed to worry about ("if you think applications will want to display
what matches, you'd better define the protocol so that this information is
returned"), and that's OK.
Exactly.
Some current-day protocols are defined such that 'x contains y' returns
true/false. Servers implementing such protocols should still be able to
use collations.
Spencer-reply: If you stuck "In this way, servers can use collations to
support protocols that are defined such that 'x contains y' returns
true-false." at the end of the paragraph, I'd "get it" without help.
I'm just struggling a bit here.
Understandably.
6. Use by Existing Protocols
...
IMAP [16] also collates, although that is explicit only when the
COMPARATOR [18] extension is used. The built-in IMAP substring
operation and the ordering provided by the SORT [17] extension may
not meet the requirements made in this document.
Other protocols may be in a similar position.
In IMAP, the default collation is i;ascii-casemap, because its
operations most closely resembles IMAP's built-in operations.
EDITORIAL: I'm guessing that the previous paragraph should be moved up
one?
No... the «Other protocols» bit applies to the text above it, not to the
text below it. But I'll change it if one other person agrees with you.
Spencer-reply: Better not change it, because I was confused when I suggested
it (this does not augur well for suggested changes). But please see next
item.
At the very least, I'm confused because I'm not sure if the top paragraph
in this extract describes the differences between i;ascii-casemap and
IMAP's built-in operations or is talking about something else.
I'll explain it, but I'm not sure you want to know ;)
In IMAP, the server advertises which extensions it supports. The client
does not advertise anything. Because of this asymmetry, the server
sometimes has to act in a way which satisfies both an unextended and an
extended client. This is such as case: The server must do substring and
sorting operations, and it often cannot tell whether the client knows
about collations.
What's needed is a default collation which is close enough to
current-day IMAP behaviour that unextended clients are not surprised by
what the server does. i;ascii-casemap is that. It may or may not be a
perfect match for IMAP/SORT as specified (Alexey Melnikov and I haven't
found a difference), and it's at least close to what servers typically
implement.
Spencer-reply: Again, your explanation helped a lot and I'd like to hijack
it into the document. Something like "In IMAP, the default collation is
i;ascii-casemap, since its operations are understood to match IMAP's
built-in operations."?
9.1.1. ASCII Numeric Collation Description
The "i;ascii-numeric" collation is a simple collation intended for
use with arbitrary sized unsigned decimal integer numbers stored as
octet strings. US-ASCII digits (0x30 to 0x39) represent digits of
the numbers. Before converting from string to integer, the input
string is truncated at the first non-digit character. All input is
valid; strings which do not start with a digit represent positive
infinity.
Is it obvious to everyone except me that leading zeros are ignored? The
examples giving a little further down say so - is making this point in
examples normative enough?
It's specified in 2244, so I don't think it's very important. This
document merely registers a collation which has been specified for a
decade and implemented in many products.
Spencer-reply: ack.
9.2.1. ASCII Casemap Collation Description
...
The i;ascii-casemap collation is well suited to to use with many
internet protocols and computer languages. Use with natural language
is often inappropriate: even though the collation apparently supports
languages such as Italian and English, in real-world use it tends to
stumble over words such as "naive", names such as "Llwyd", people and
place names containing non-ASCII, euro and pound sterling symbols,
quotation marks, dashes/hyphens, etc.
OK, this may be inadvertantly funny - are "naive" and "Llwyd" supposed to
include a non-ascii character, or is that sentence saying something else?
(Welcome to the world of the RFC Editor)
I would write naïve if I could. I assume people know that naive and
naïve are both common spellings.
Llwyd is thus spelt. The Welsh consider ll a separate letter and sort it
between l and m.
Spencer-reply: I guess my point was that this was extremely subtle for those
of us who don't work with i18n comparison all day long. Perhaps 'Welsh names
such as "L1wyd", when the Welsh consider "ll" a separate letter and sort it
between "1" and "m"'? But you're going to have to figure out how to get
"naïve" into an RFC... Perhaps your AD can step in front of this speeding
bullet?
From Spencer: I'm thinking that the "checking the SP SP "1" SP SP string
for correctness" also needs to be done pretty soon :-0
Each and every time I run xml2rfc, actually. I grew bored :(
Arnt
_______________________________________________
Gen-art mailing list
[email protected]
https://www1.ietf.org/mailman/listinfo/gen-art