Re: Getting rid of the sequence numbers

2003-02-21 Thread Timo Sirainen
On Fri, 2003-02-21 at 08:17, Mark Crispin wrote:
 On Thu, 21 Feb 2003, Timo Sirainen wrote:
  I'd like to know how you can make a client efficiently handle sequence
  numbers. If internal message structure contains just the sequence
  number, it has to be updated every time an older message is deleted.
 
 An obvious structure is a vector of pointers to a msgstruct, indexed by
 sequence number.  All you have to do for EXPUNGE is byte-blat the pointers
 down; that's a memmove() which some processors implement in hardware.

msgstruct - sequence lookups would then have to find the msgstruct from
the array. Not too fast operation really.

 You don't have to do the blat to EXPUNGE based upon UIDs, but you have to
 have some sort of hash based on UID to locate the msgstruct.  Locating a
 msgstruct is a much more common operation than expunging one.

And creating fetch request for a message is almost as common operation.
Doing several slow array lookups there to find out the sequences could
well be overall slower than constant UID hash lookups.

 There's also the big cost of fetching the UID map at session startup,
 which is completely unnecessary unless you have a local cache of the
 mailbox state.

Evolution has local cache so it wouldn't usually have to do that. More
problematic is keeping the message flags updated. Doing FETCH 1:* FLAGS
isn't exactly nice, but anything else limits the functionality. Well, I
guess updating summaries and virtual folders with SEARCH and fetching
flags for only visible messages would be possible.

  I see sequence numbers useful only when you know you want to fetch
  exactly n messages, and even that doesn't work if some of those messages
  just happened to get deleted.
 
 Nope.  You missed the part in which a server can't do untagged expunges
 except at certain well-defined points.

That doesn't mean that the message still couldn't be physically deleted.
If it's gone, server can't send it to client.

  Actually in some situations relying on sequence numbers could even lose
  messages. Suppose a client (maybe a webmail) showing messages 1..10 on
  screen. Next-button would load the next 10. If the IMAP connection got
  closed before next-button was clicked and some of those messages were
  expunged before connection was up again, fetching 10..20 would have
  skipped over some of the unread messages.
 
 Ah, you're assuming a stupid webmail client which continually reopens new
 IMAP sessions for the same webmail session.

I was more of thinking a user that waits for a long time before hitting
the next button. IMAP connections have to have some timeouts in
webmails. Or maybe the connection got closed for some other reason
(network problems, server restart). Then again, maybe the whole session
should just be invalidated when IMAP connection is lost.

That could be a bit problematic with normal IMAP clients as well. It
would have to refetch the visible messages to make sure they weren't
changed. But what messages exactly? If it couldn't be done by UID,
finding them wouldn't be that easy if it's sequence had changed. Message
under cursor especially shouldn't change (user could just be hitting
delete+expunge).

 You can do the same thing with message sequence numbers, and consume less
 bandwidth since sequence numbers are smaller than UIDs.

OK, so slightly smaller bandwidth usage is second plus in sequences.

  I really don't see when/why sequences are so much better than UIDs.
 
 Unlike UIDs, you know exactly how many messages are in a sequence, even if
 you currently have no information about any of those messages in the
 client state.
 
 With UIDs, you have to download the UID map, which for a moderate to large
 mailbox (e.g. 5000 messages) is a substantial amount of data.  It will
 kill you unless you have a fast network.  Try it over CDPD (packet IP over
 cellular) sometime.

Depends on how client is supposed to be used. If client caches the
messages locally it doesn't have to download the whole UID map. And I
think that's more common way to use IMAP than using mail clients in
phones, PDAs and such. They could of course keep on using sequences if
it fits better to their typical use.




mail vs. news ???

2003-02-21 Thread Ken Murchison
I find it interesting, if not disturbing, that some members of the
usenet community seem to think that mail messages and usenet articles
are not the same thing.  AFAICT, from reading the relevant standards,
writing server code for SMTP/LMTP/IMAP/POP3/NNTP, and everyday use, mail
messages and news articles both conform to RFC 2822 (RFC 1036 states as
much).  The only differences that I'm aware of are the following:

- usenet puts a greater restriction on the headers (although still being
RFC 2822 compliant)
- mail messages are typically tranmitted over a 1-to-1 protocol (SMTP)
and news articles are typically transmitted over a 1-to-many protocol
(NNTP)

Could somebody please enlighten me as to any others differences,
perceived or otherwise?

-- 
Kenneth Murchison Oceana Matrix Ltd.
Software Engineer 21 Princeton Place
716-662-8973 x26  Orchard Park, NY 14127
--PGP Public Key--http://www.oceana.com/~ken/ksm.pgp



Re: Getting rid of the sequence numbers

2003-02-21 Thread Steve Hole
On 21 Feb 2003 07:34:59 +0200 Timo Sirainen [EMAIL PROTECTED] wrote:

 I'd like to know how you can make a client efficiently handle sequence
 numbers. If internal message structure contains just the sequence
 number, it has to be updated every time an older message is deleted.
 Since it's just memory it's not too slow, but I can't see how that could
 be better (from client's point of view) than simply using UIDs where you
 don't have to do any updates at all.

Sigh.   Please go back and read the archives of this group.   Have a look 
at the c-client code.   Really, it takes but a moment of thought to figure
out how to use sequence numbers efficiently and how to integrate them 
flawlessly with UID.   It also shows why they are useful in efficient 
client construction.

The biggest issue with sequence numbers (in my opinion) is that we haven't
used them to their full potential.   This is the basis of the how to do 
sort, and subsequently thread, extensions debate.   That is also water 
well under the bridge. 

Cheers.

---
Steve Hole
Chief Technology Officer - Billing and Payment Systems
ACI Worldwide
mailto:[EMAIL PROTECTED]
Phone: 780-424-4922




Re: IMAP and Netnews

2003-02-21 Thread Ken Murchison


Charles Lindsey wrote:
 
 In [EMAIL PROTECTED] Mark Crispin 
[EMAIL PROTECTED] writes:
 
  But clients that interoperate with IMAP usually also have the capability
  to interoperate with POP3, SMTP, NNTP and maybe even UUCP. I have never
  seen any suggestion that those other servers are in any way obligated to
  fix things that the client is unable to swallow/digest.
 
 All clients and servers of these protocols are required to comply.  8-bit
 headers are non-compliant.
 
 No, there is no requirement AFAIK for POP3 or SMTP servers to handle
 Netnews articles. They are designed to cope with Email messages and Usefor
 is taking care to ensure that they need never see anything else. If they
 do, then some standard has not been complied with.

The use of the term requirement could be discussed at length, but you
can't put your head in the sand and ignore the fact that new articles DO
pass through POP3 (probably rarely), IMAP and SMTP.  One of the big
issues which keeps getting brought up by the usefor people is current
practice.  That's fine, but you can't just restrict this argument to
current usenet practice.  You have to consider the entire playing
field, not just your corner of it.

-- 
Kenneth Murchison Oceana Matrix Ltd.
Software Engineer 21 Princeton Place
716-662-8973 x26  Orchard Park, NY 14127
--PGP Public Key--http://www.oceana.com/~ken/ksm.pgp



Re: Getting rid of the sequence numbers

2003-02-21 Thread Timo Sirainen
On Fri, 2003-02-21 at 17:33, Simon Josefsson wrote:
  Not really, why would you _need_ to get a list of all messages? Client
  can request the messages from server only when they become visible in
  screen. Scrollbar sizes and such can be generated from just the total
  amount of messages. Before the message is loaded from server, client
  could just show loading .. instead of the from/to/subject/whatever.
 
 This assumes alot about the client that doesn't necessary hold.  Not
 all clients generate INBOX summaries dynamically based on where the
 scrollbar is.  Some clients doesn't even have a scroll bar.

Right, they don't do it, but that doesn't mean that they couldn't. If
there's no scrollbar, that's even easier to handle then since there's
probably just next screenful and previous screenful (and maybe
top/bottom).

 Generally I think it is more productive to stop regarding certain
 client behaviour, that is valid according to the specification, as
 broken or bad.

I agree. If client doesn't work well for some specific use that user
wants, there's always other clients that may work better for that
purpose.

 Of course, my preferences may be biased by
 being an author of an IMAP client that wasn't designed around the
 scrollbar paradigm...

If client was designed well enough, I don't think it should be too
difficult to modify to support fetching only visible messages. Others
could simply contain dummy information with not-fetched-yet flag on.




Re: Getting rid of the sequence numbers

2003-02-21 Thread DINH Viêt Hoà
 Not really, why would you _need_ to get a list of all messages? Client
 can request the messages from server only when they become visible in
 screen. Scrollbar sizes and such can be generated from just the total
 amount of messages. Before the message is loaded from server, client
 could just show loading .. instead of the from/to/subject/whatever.

if you have to score messages or sort messages (not using thread
extension of IMAP ^_^ ), you have to fetch all the messages.

-- 
DINH V. Hoa



Re: Getting rid of the sequence numbers

2003-02-21 Thread Cyrus Daboo
Hi Timo,

--On Friday, February 21, 2003 7:34 AM +0200 Timo Sirainen [EMAIL PROTECTED] 
wrote:

| Actually in some situations relying on sequence numbers could even lose
| messages. Suppose a client (maybe a webmail) showing messages 1..10 on
| screen. Next-button would load the next 10. If the IMAP connection got
| closed before next-button was clicked and some of those messages were
| expunged before connection was up again, fetching 10..20 would have
| skipped over some of the unread messages.

Well this is of course a bogus situation because no client can cache 
sequence numbers across different connections - they have to use UIDs for 
that. Most of the webmail solutions I know of that do not maintain 
persistent IMAP connections do use UIDs.

This issue is in fact one of the major disadvantages to relying on sequence 
numbers given the propensity of network devices (e.g. firewalls, cable/dsl 
modems etc) to timeout idle connections at an interval less than the IMAP 
30 minute timeout. Those devices pretty much force online clients to have 
to NOOP poll at a much shorter interval than they ought to to keep the 
connection alive. If the connection does die, the client is forced to 
effectively do a full resync of its cached state (or just throw it out and 
start over) if it attempts to recover the lost connection. This problem has 
been a major headache for me over the last few years.

--
Cyrus Daboo




Re: Getting rid of the sequence numbers

2003-02-21 Thread Mark Crispin
On 21 Feb 2003 19:02:26 +0200, Timo Sirainen wrote:
 OK, I looked through c-client and Pine code. It looks just as difficult
 as I expected. It uses multiple arrays for seq - message lookups.

Bullshit.  There is one cache.  Don't get confused by the sortcache which is
not seq-message lookup.

 It
 often has to go through the whole array just to find the message (for
 every message it has to fetch I think?).

That's only if you look up by UIDs, which a well-written client rarely has to
do if you use sequence numbers effectively.

I imagine that if I created a UID-seq hash table you would then be saying it
has to maintain multiple tables to support sequence numbers.

 And of course it has to update
 the arrays every time messages are expunged.

The only reason why it does that update is for a feature that is so advanced
that most clients don't use it; a multithread client (such as my MailManager
application) can lock a message cache entry and prevent it from being
expunged even if it is expunged on the server.  That way it doesn't have to
maintain a separate copy, ever; but it does need a back pointer.

I doubt very much that there's anything like that in your UID-only client.

 Pine also doesn't seem to do any automatic reconnection to server. I'd
 think that would get annoying with bad internet connections.

It's rare to need to do automatic reconnection, even with flakey network
connections, if you do networking the way you're supposed to.  Such as not
killing a perfectly good TCP connection because of the slightest router flap.

Most of the need for automatic reconnection vanishes if you take the trouble
not to sacrifice connections.

CDPD is about as flakey as you can get, yet Pine works quite well with it.

 If it did
 do reconnecting, would the current code require resyncing everything
 after connect

Since we never fetch everything, we never need to resync everything.  Just
toss out the local cache and do demand-fetching.

Demand-fetching is good.  Demand fetching means that you work on slow
connections.

Pine works well over CDPD.  Does your client?




Re: Unicode newsgroup name options

2003-02-21 Thread Russ Allbery
D J Bernstein [EMAIL PROTECTED] writes:

 Actually, there's very little opposition (especially among implementors)
 to requiring all MTAs, MUAs, etc. to handle UTF-8 messages. Eventually
 we will all be using UTF-8; all relevant bugs must be fixed. Only the
 wildest ``7 bits forever!'' proponents, such as Keith Moore, disagree.

 The real controversy is over whether we should also do _other_ things
 before UTF-8 is working everywhere. For example, should we introduce
 some ad-hoc 7-bit character encoding for newsgroup names?

 Many of us (especially implementors) believe that these short-term 7-bit
 kludges have huge costs (as illustrated by your message) and miniscule
 benefits. We believe that the 7-bit kludges should be dropped.

 Our opponents are claiming that the IESG will demand a 7-bit solution.
 But they aren't opposing the requirement of UTF-8 support; they're
 opposing the reliance on UTF-8 as the sole solution.

I'm not sure that I agree with your summary of the positions, but I'm
certainly sympathetic to this viewpoint.  Just implementing UTF-8 feels a
lot cleaner to me too.

However, my main interest personally is to get something published by the
IETF documenting the Usenet article format that isn't as horribly obsolete
and out of date as RFC 1036 is.  I'm also most definitely not a mail
system implementor or an IMAP implementor and don't know what issues
implementors in those areas face.

One additional option that I didn't mention would be to decide that the
IETF standards process is out of touch with the reality of what
implementors want and to then simply punt on specifying a non-ASCII
encoding for newsgroups in the standard (so as not to get dragged into
these arguments) and encourage anyone who wants to use a non-ASCII
character set in practice to use UTF-8.  This will work for most news
server software.  I personally don't have a sufficient grasp on the issues
facing news client implementors, IMAP implementors, or mail system
implementors to know whether that's a viable solution outside of news, and
if so, whether it's the best solution available.

-- 
Russ Allbery ([EMAIL PROTECTED]) http://www.eyrie.org/~eagle/


Re: Unicode newsgroup name options

2003-02-21 Thread Mark Crispin
On Fri, 21 Feb 2003, Russ Allbery wrote:
 D J Bernstein [EMAIL PROTECTED] writes:
  Actually, there's very little opposition (especially among implementors)
  to requiring all MTAs, MUAs, etc. to handle UTF-8 messages. Eventually
  we will all be using UTF-8; all relevant bugs must be fixed. Only the
  wildest ``7 bits forever!'' proponents, such as Keith Moore, disagree.

I disagree, in the very strongest terms, with both this categorization of
the situation and with the personal attack.

  Many of us (especially implementors) believe that these short-term 7-bit
  kludges have huge costs (as illustrated by your message) and miniscule
  benefits. We believe that the 7-bit kludges should be dropped.

The definition of many and we is extremely subjective, particularly as
it is a minority opinion.

 One additional option that I didn't mention would be to decide that the
 IETF standards process is out of touch with the reality of what
 implementors want

The people who advocate this option are those individuals who have not
been able to get their way.  I strongly suggest that you plug your ears
when you hear this siren song, lest you cause a disaster for yourself and
wreak havoc on others.

-- Mark --

http://staff.washington.edu/mrc
Science does not emerge from voting, party politics, or public debate.


Re: Getting rid of the sequence numbers

2003-02-21 Thread Mark Crispin
On Fri, 21 Feb 2003, Timo Sirainen wrote:
 But whenever sorting is done, there is the sort array that has to be
 updated and accessed slowly whenever you get fetch envelope reply
 (pine_imap_envelope - mn_raw2m() - msgno_in_sort()).

Wrong.  What you are seeing in Pine is a mapping from a view.  That is
something that would have also to be done (and be much slower) in a UID
client that had the same functionality.

 Also if it's sorted in any way. I guess sequences work well enough for
 unsorted mailboxes.

They work better than UIDs for sorted or unsorted.

  Since we never fetch everything, we never need to resync everything.  Just
  toss out the local cache and do demand-fetching.
 Meaning that you toss out a completely usable cache just because you
 don't want to use UIDs?

Since we never fetch all that much into the cache, there isn't that much
cost.  Note too that on shared machines a cache that persists beyond
sessions is a security bug.  Finally, if the user uses multiple machines,
then that's a lot of duplication of mail (including sensitive
information).

Persistant caching is only good if a user sticks to one or two dedicated
machines.

-- Mark --

http://staff.washington.edu/mrc
Science does not emerge from voting, party politics, or public debate.


Re: mail vs. news ???

2003-02-21 Thread Mark Crispin
On Fri, 21 Feb 2003, Russ Allbery wrote:
 Usenet's restrictions on the syntax of message ID headers are very
 specific and very precise, and much stronger than those of RFC 2822, in
 part because message IDs are used as part of the NNTP protocol.

What are those restrictions?

 Comments
 in various places that mail supports them are not well-supported by
 currently deployed Usenet software (although it certainly hurts nothing to
 support them when writing new code, other than adding complexity).  The
 space after the colon in headers is not optional on Usenet.  The syntax of
 the Date header is restricted in ways somewhat similar to that of the
 Message-ID header.

Golly gee, where's the chorus of these are bugs that should be fixed
now?  First we hear the claim that 7-bit messaging restrictions in mail
are a bug that should be fixed even though 7-bit was specifically in the
standard.

Now we hear the claim that completely unnecessary restictions in
headers are necessary because of news software.

And the IETF/IESG is supposed to respect this?

 - National 8-bit character sets are in widespread use in Usenet message
 headers, possibly more widespread than they are in (non-spam) mail
 messages.  Untagged 8-bit national character sets are widely used in
 various non-English hierarchies in headers as the preferred way of
 including such content, and in some cases use of RFC 2047 is frowned on.

This is because portions of the news community listened to the siren song
of just send 8-bits offered by those individuals who song was rejected
in mail.  Now the news community has a non-interoperable disaster.

But rather than fix the disaster, they seem to want to inflict a new
disaster upon the email community.

The solution to interoperability is to stop claiming that news is special,
and start playing ball with the rest of the messaging world.  This means
making compromises, including at times accepting what seems to be
unnecessary limitations, in order to achieve interoperability.

-- Mark --

http://staff.washington.edu/mrc
Science does not emerge from voting, party politics, or public debate.


Re: Unicode newsgroup name options

2003-02-21 Thread Mark Crispin
On Thu, 20 Feb 2003, Russ Allbery wrote:
   | 1   2   3   4   5   6   7   8   9  10  11  12  13  14
  -+--
  A| D   C   N   N   D   Y   N   N   N   Y   N   D   Y   D
  B| Y   Y   C   Y   Y   N   N   N   N   N   N   N   C   D
  C| D   C   C   C   D   N   N   N   N   N   N   N   C   D

Choice A has three Ys, four Ds, and one C.

Choice B has with four Ys, one D, and two Cs.

Choice C, the ugly duckling, has no Ys, three Ds, and four Cs.

Choice A has one fewer N than B or C.

If you believe in the premise that a Y is more expensive than a D, and
a D is more expensive than a C, then choice C (punycode everywhere)
stands out as the preferable choice.  The more expensive that you rate a
Y compared to a D or C, the better than choice C looks.

Now, let's factor out the items in which all three choices are equivalent,
and the superiority of choice C becomes even more apparent.

 | 1   2   3   4   5   6   10  12  13
-+---
A| D   C   N   N   D   YY   D   Y
B| Y   Y   C   Y   Y   NN   N   C
C| D   C   C   C   D   NN   N   C

-- Mark --

http://staff.washington.edu/mrc
Science does not emerge from voting, party politics, or public debate.


Re: Getting rid of the sequence numbers

2003-02-21 Thread Mark Crispin
On Fri, 21 Feb 2003, Timo Sirainen wrote:
 I don't know anyone who accesses their mail from more than a few
 computers.

In a typical day, I use from three to five different computers to access
my mail.  So do my co-workers.

 I use IMAP only at home for accesssing my mails, elsewhere I
 just ssh into my server and read the mails there. Good and secure ssh
 clients are easier to find and setup than IMAP clients.

Well, you stated your problem: you don't use a good IMAP client.

I wouldn't think of ssh'ing into my server to read my mail.  I happen to
have that privilege as a server software maintainer, but most users can
not.

I especially would not think of ssh'ing into a server to read mail via
CDPD.  Shudder.

Yet, with a good IMAP client such as Pine, I have a difficult time telling
that I'm using CDPD instead of an 802.11b wireless, unless the message is
large.  Since I pay a flat rate for CDPD, I'll use it instead of 802.11b
when travelling if the local access point charges for 802.11b.

-- Mark --

http://staff.washington.edu/mrc
Science does not emerge from voting, party politics, or public debate.


Re: mail vs. news ???

2003-02-21 Thread Russ Allbery
Mark Crispin [EMAIL PROTECTED] writes:
 On Fri, 21 Feb 2003, Russ Allbery wrote:

 Usenet's restrictions on the syntax of message ID headers are very
 specific and very precise, and much stronger than those of RFC 2822, in
 part because message IDs are used as part of the NNTP protocol.

 What are those restrictions?

The primary ones are:

 * Absolutely no occurences of either whitespace or the  character,
   escaped or not, are permitted inside the message ID.  Either is known
   to break existing software in various ways.

 * Nothing is permitted in the Message-ID header other than the message ID
   itself.  Comments either preceding or following the message ID will
   cause the message to be rejected by many news servers.

 * The message ID must not be longer than about 500 characters.  The
   failure mode for violating this rule tends to be rather nasty for some
   existing NNTP software, including things like desynchronization of the
   protocol between the client and server.  NNTP (unfortunately) has a
   maximum command length defined as part of the protocol.  In practice,
   many news servers enforce a 250 octet limit (including the surrounding
   angle brackets).

Please note that I'm not arguing that these restrictions are desirable,
simply that violating them *will* break existing news software.  I also
don't think that fixing one and possibly two is really worth the effort,
since there isn't much in the way of useful purpose served by not
following those rules anyway.

 Comments in various places that mail supports them are not
 well-supported by currently deployed Usenet software (although it
 certainly hurts nothing to support them when writing new code, other
 than adding complexity).  The space after the colon in headers is not
 optional on Usenet.  The syntax of the Date header is restricted in
 ways somewhat similar to that of the Message-ID header.

 Golly gee, where's the chorus of these are bugs that should be fixed
 now?

Are you expecting me to serve as the chorus?  I certainly hope that you're
not expecting me to try to be consistent with statements made by other
people that I don't necessarily agree with.  I tend to hold my own
opinions and not necessarily agree with other people.  :)

 First we hear the claim that 7-bit messaging restrictions in mail are a
 bug that should be fixed even though 7-bit was specifically in the
 standard.

 Now we hear the claim that completely unnecessary restictions in headers
 are necessary because of news software.

These restrictions are published in RFC 1036, so I would not expect them
to be a surprise to news implementors.  Usenet has, since B news, used a
subset of the mail messaging format.

RFC 1036 is unfortunately imprecise about precisely what additional
restrictions it put on the message format, but at the least the space
after the colon in headers is quite explicit.  The message ID restrictions
are also fairly clear apart from the length limitation (which falls out of
the NNTP protocol instead).  (The bit in RFC 1036 about slashes being
strongly discouraged in message IDs is now completely obsolete.)

The Date specification in RFC 1036 is obnoxious, referring to a particular
software implementation that isn't documented as part of the standard.  In
practice, an RFC 2822 date that doesn't use any of the obsolete syntax is
fine provided that the header is not folded.

Issues surrounding comments are more complex.  Apart from Date and
Message-ID, which are the most sensitive headers that are also shared with
mail, comments in References headers are unlikely to cause catastrophic
problems but may show up as oddities in the thread tree in a news reader
and news software can be fairly picky about the From header (although one
is likely fine as long as one avoids the obsolete syntax rules).

 And the IETF/IESG is supposed to respect this?

My message was solely addressing the differences *in practice* that exist
right now on the wire.  I was not attempting to make any sort of statement
about what the future should like.

I personally am very strongly in favor of the unification of messaging
formats, and think that this is one of the most important things that
could come out of USEFOR.  I think that it's reasonable to simply require
that Usenet software going forward cope with comments in the References
header and with the full From syntax in RFC 2822 (possibly omitting the
obsolete rules, since they have never been supported on Usenet).

I'm ambivalent about folded dates.  The date parsing software that I've
written personally and that is used in the software I maintain supports
them.  I don't understand why anyone would generate a folded date, though,
so I can understand why people don't see what purpose is served in
supporting it.

I think that not requiring a space after the colon in headers (except for
compatibility with older messages) is silly, but I don't have a strong
opinion on it.  Changing news software to support this can be a 

Re: Getting rid of the sequence numbers

2003-02-21 Thread Rob Siemborski
On Fri, 21 Feb 2003, Timo Sirainen wrote:

 I don't know anyone who accesses their mail from more than a few
 computers. I use IMAP only at home for accesssing my mails, elsewhere I
 just ssh into my server and read the mails there. Good and secure ssh
 clients are easier to find and setup than IMAP clients.

Imagine an environment such as a university where there are many publicly
accessible cluster machines.  Or perhaps a corporation which doesn't
assign specific cubicles and instead has a large number of nonspecific
desktop machines.

In such an environment, not only are different machines being used by many
different users continously, but individual users are not limited to
specific machines.  (Of course, this has its own issues like ensuring
preferences follow the users, but its a very real environment).

-Rob

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Rob Siemborski | Andrew Systems Group * Research Systems Programmer
PGP:0x5CE32FCC | Cyert Hall 207 * [EMAIL PROTECTED] * 412.268.7456
-BEGIN GEEK CODE BLOCK
Version: 3.12
GCS/IT/CM/PA d- s+: a-- C$ ULS$ P+++$ L+++() E W+ N o? K-
w O- M-- V-- PS+ PE++ Y+ PGP+ t+ 5+++ R tv- b+ DI+++ G e h r- y?
--END GEEK CODE BLOCK-



Re: Unicode newsgroup name options

2003-02-21 Thread Russ Allbery
Mark Crispin [EMAIL PROTECTED] writes:

 Now, let's factor out the items in which all three choices are
 equivalent, and the superiority of choice C becomes even more
 apparent.

  | 1   2   3   4   5   6   10  12  13
 -+---
 A| D   C   N   N   D   YY   D   Y
 B| Y   Y   C   Y   Y   NN   N   C
 C| D   C   C   C   D   NN   N   C

That was the conclusion that was jumping out at me as well, but here are a
few other things to keep in mind:

 * These columns don't really have equal weight, in that some of them
   represent small numbers of installations of relatively easily changed
   software and some of them represent very large installed bases or
   software that's difficult to change.  In particular, (1) and (5), the
   installed base of news readers, will take a very long time to change,
   and (2), (3), and (4), the news server software, we know is rarely
   updated and upgrades are difficult to motivate.  Usenet server software
   routinely runs for years on autopilot without any maintenance.

   By comparison, (6), the process that sends mail to moderated groups, is
   a small and easily changed component in most situations, and the number
   of moderators (10), news to mail gateways (12), and IMAP servers
   serving news messages (13) are all, while certainly significant, much
   smaller than the number of installations of the core Usenet software.

 * The single largest set of installed software, (1) and (5), is almost a
   C for proposal (A).  We know that some existing software will work with
   UTF-8 newsgroup names out of the box without modification, although it
   will require some tweaking for ideal operation.  By comparison,
   punycode (C) we know won't work correctly with *any* existing software;
   the only reason why that column is a D instead of Y is that users can
   use the funny-looking encoded names and still participate in the
   groups.

   This is one of the stronger arguments in favor of (A), namely that you
   can implement it to a surprising degree without changing any news
   software at all.

 * There are other issues not reflected on this matrix at all, such as
   complexity of implementation and compatibility with the existing
   messaging format, that weigh in various directions.

Again, this is not to disagree with your conclusion.  I just wanted to
point out that while I found the table helpful, it's a bit over-simplified
and hides the nature of some of the issues and tradeoffs.

-- 
Russ Allbery ([EMAIL PROTECTED]) http://www.eyrie.org/~eagle/


Re: Getting rid of the sequence numbers

2003-02-21 Thread Mark Crispin
On Fri, 22 Feb 2003, Timo Sirainen wrote:
  Well, you stated your problem: you don't use a good IMAP client.
 That could be it. Installing and running would have to be as easy as
 sshing with putty though. Meaning you could get imapclient.exe from web
 page which you can run directly, only required configuration should be
 entering username, password and IMAP server host.

That sounds like PC Pine.

 Actually it would also need SMTP server configuration unless it acts as
 such itself. Maybe IMAP server could give some SMTP configuration hints
 to client so user wouldn't have to set those manually.

We don't have that, but we do have a way to have a remote Pine
configuration so that a user can use Pine on multiple platforms and get
your one true configuration each time.

We even have a way to do NNTP proxy via IMAP, with minimal noticable
difference between that and direct NNTP.  That keeps your newsrc on the
same place too.

 So on topic summary: I'm
 not against clients doing well optimized server fetches, but I don't
 think clients failing to do so are useless crap.

In my experience, clients which make dumb mistakes such as:
1 UID FETCH 237 FLAGS
* 3 FETCH (FLAGS (\Seen) UID 237)
1 OK done
2 UID FETCH * FLAGS
* 4 FETCH (FLAGS (\Seen) UID 483)
2 OK done
4 UID FETCH 238:482 FLAGS
4 OK no messages there
will do other stupid things as well.

It gets worse.  The same UID-only client that fails to realize that
there can't be any UIDs between a UID with sequence 3 and a UID with
sequence 4, also fails to grasp that the failure to find any UIDs in
that range means that there won't ever be any in that range, and keeps
on trying to find them.

There are clients which do UID FETCH 1:* UID repeatedly in the same
session.  Some of these clients do it as a poll for new mail, since they
disregard the EXISTS response.

Then there are the clients which spawn connections for no good reason, but
that's another story.

This isn't not doing well-optimized server fetches.  This is doing
well-pessimized server fetches.

 And I still don't see
 how sequences would be inherently better for client to use than UIDs.

If you don't use sequences, then each and every cache reference requires a
lookup to locate the associated message for the UID in the cache.  At
best, this is a hash.  With sequences, it's an index.

If you use sequences, you know when you get an EXISTS precisely how many
new messages there are (if in fact there are any).  With UID-only, you
have to do lastuid+1:* and woe be it to you if the server is one of the
broken ones (like Courier) which incorrectly assumes that the left side of
the : must be less than the right side.

If you use sequences, you know when you get an EXPUNGE precisely which
message was expunged (hence your request for an extra UIDEXPUNGE, which
would burden all IMAP sessions with additional traffic -- remember that
EXPUNGE can be unsolicited).

If you use sequences, your commands, especially when the sets get large,
will be much smaller than with UIDs.  So will your SEARCH, SORT, and
THREAD results.

Last but not least, if you use sequences, you as a programmer are
compelled to consider silly cases (such as I indicated above), and avoid
doing them.  You can't build a sequence between, but not including, 3 and
4; therefore you know that there's nothing to do.

With UID-only, the silliness of what you may be doing is obscured from
you, and your client ends up embarassing itself to server maintainers who
run protocol traces to answer their customers' questions as to why is it
so slow?

Time and time again, I hear the advocates of UID-only claim that what they
are doing is better or more efficient.  Time and time again, when I
see what the client does over the wire, these claims ring hollow.

-- Mark --

http://staff.washington.edu/mrc
Science does not emerge from voting, party politics, or public debate.


Re: mail vs. news ???

2003-02-21 Thread Russ Allbery
Ken Murchison [EMAIL PROTECTED] writes:

 OK.  As I suspected there is nothing inherent in RFC [2]822 that makes
 it unsuitable for news.

Spaces in message IDs make them unsuitable for news.  This really, really
does break things, honest, I swear.  I'm not just making this up.  :)

In general, the statements in RFC 1036 putting additional limitations on
the article format *are* implemented on Usenet and *are* relied on in
practice by Usenet software.

But, rewriting your statement to say there is nothing inherent in RFC
2822 with RFC 1036 limitations applied that makes it unsuitable for news,
I agree.

 So, the claims that mail and news are not the same was either misleading
 and/or wishful thinking.

That is quite certainly my opinion.

-- 
Russ Allbery ([EMAIL PROTECTED]) http://www.eyrie.org/~eagle/


Re: Unicode newsgroup name options

2003-02-21 Thread ned
 The wildmat problem is a red herring.  Wildmat implementations need to be
 cognizant of Unicode in far more substantial ways than merely overcoming
 punycode issues.  A well-thought-out stringprep requirement will help
 some, but then the stringprep has to be implemented.

This last point of Mark's is worth noting. Regardless of whether you end up
with UTF-8, punycode, UTF-7, or some other encoding of Unicode, a stringprep
profile is going to be required in order to move forward. It can either be
specified directly or indirectly as part of the underlying encoding, but it has
to be there.

Part of the problem with having a group take so long to complete its work is
that what's minimally acceptable to the IETF/IESG changes over time. It used to
be that protocol could just use some encoding of Unicode, end of story. No
more. Now that stringprep exists the IESG requires it be used when Unicode
protocol elements are involved. Several recent specifications have been
returned to their respective working groups because they didn't do this.

I note in passing that while the current News Article Format Draft talks about
normalization of group names, it does so without reference to stringprep. This
would need to change: Stringprep covers stuff besides normalization, and more
generally provides a checklist for Unicode usage each protocol needs to
consider.

I also note that Dan Kohn's alternative News Article Format draft uses
punycode, which already includes a stringprep profile.

Ned