Re: Getting rid of the sequence numbers
On Fri, 2003-02-21 at 08:17, Mark Crispin wrote: On Thu, 21 Feb 2003, Timo Sirainen wrote: I'd like to know how you can make a client efficiently handle sequence numbers. If internal message structure contains just the sequence number, it has to be updated every time an older message is deleted. An obvious structure is a vector of pointers to a msgstruct, indexed by sequence number. All you have to do for EXPUNGE is byte-blat the pointers down; that's a memmove() which some processors implement in hardware. msgstruct - sequence lookups would then have to find the msgstruct from the array. Not too fast operation really. You don't have to do the blat to EXPUNGE based upon UIDs, but you have to have some sort of hash based on UID to locate the msgstruct. Locating a msgstruct is a much more common operation than expunging one. And creating fetch request for a message is almost as common operation. Doing several slow array lookups there to find out the sequences could well be overall slower than constant UID hash lookups. There's also the big cost of fetching the UID map at session startup, which is completely unnecessary unless you have a local cache of the mailbox state. Evolution has local cache so it wouldn't usually have to do that. More problematic is keeping the message flags updated. Doing FETCH 1:* FLAGS isn't exactly nice, but anything else limits the functionality. Well, I guess updating summaries and virtual folders with SEARCH and fetching flags for only visible messages would be possible. I see sequence numbers useful only when you know you want to fetch exactly n messages, and even that doesn't work if some of those messages just happened to get deleted. Nope. You missed the part in which a server can't do untagged expunges except at certain well-defined points. That doesn't mean that the message still couldn't be physically deleted. If it's gone, server can't send it to client. Actually in some situations relying on sequence numbers could even lose messages. Suppose a client (maybe a webmail) showing messages 1..10 on screen. Next-button would load the next 10. If the IMAP connection got closed before next-button was clicked and some of those messages were expunged before connection was up again, fetching 10..20 would have skipped over some of the unread messages. Ah, you're assuming a stupid webmail client which continually reopens new IMAP sessions for the same webmail session. I was more of thinking a user that waits for a long time before hitting the next button. IMAP connections have to have some timeouts in webmails. Or maybe the connection got closed for some other reason (network problems, server restart). Then again, maybe the whole session should just be invalidated when IMAP connection is lost. That could be a bit problematic with normal IMAP clients as well. It would have to refetch the visible messages to make sure they weren't changed. But what messages exactly? If it couldn't be done by UID, finding them wouldn't be that easy if it's sequence had changed. Message under cursor especially shouldn't change (user could just be hitting delete+expunge). You can do the same thing with message sequence numbers, and consume less bandwidth since sequence numbers are smaller than UIDs. OK, so slightly smaller bandwidth usage is second plus in sequences. I really don't see when/why sequences are so much better than UIDs. Unlike UIDs, you know exactly how many messages are in a sequence, even if you currently have no information about any of those messages in the client state. With UIDs, you have to download the UID map, which for a moderate to large mailbox (e.g. 5000 messages) is a substantial amount of data. It will kill you unless you have a fast network. Try it over CDPD (packet IP over cellular) sometime. Depends on how client is supposed to be used. If client caches the messages locally it doesn't have to download the whole UID map. And I think that's more common way to use IMAP than using mail clients in phones, PDAs and such. They could of course keep on using sequences if it fits better to their typical use.
mail vs. news ???
I find it interesting, if not disturbing, that some members of the usenet community seem to think that mail messages and usenet articles are not the same thing. AFAICT, from reading the relevant standards, writing server code for SMTP/LMTP/IMAP/POP3/NNTP, and everyday use, mail messages and news articles both conform to RFC 2822 (RFC 1036 states as much). The only differences that I'm aware of are the following: - usenet puts a greater restriction on the headers (although still being RFC 2822 compliant) - mail messages are typically tranmitted over a 1-to-1 protocol (SMTP) and news articles are typically transmitted over a 1-to-many protocol (NNTP) Could somebody please enlighten me as to any others differences, perceived or otherwise? -- Kenneth Murchison Oceana Matrix Ltd. Software Engineer 21 Princeton Place 716-662-8973 x26 Orchard Park, NY 14127 --PGP Public Key--http://www.oceana.com/~ken/ksm.pgp
Re: Getting rid of the sequence numbers
On 21 Feb 2003 07:34:59 +0200 Timo Sirainen [EMAIL PROTECTED] wrote: I'd like to know how you can make a client efficiently handle sequence numbers. If internal message structure contains just the sequence number, it has to be updated every time an older message is deleted. Since it's just memory it's not too slow, but I can't see how that could be better (from client's point of view) than simply using UIDs where you don't have to do any updates at all. Sigh. Please go back and read the archives of this group. Have a look at the c-client code. Really, it takes but a moment of thought to figure out how to use sequence numbers efficiently and how to integrate them flawlessly with UID. It also shows why they are useful in efficient client construction. The biggest issue with sequence numbers (in my opinion) is that we haven't used them to their full potential. This is the basis of the how to do sort, and subsequently thread, extensions debate. That is also water well under the bridge. Cheers. --- Steve Hole Chief Technology Officer - Billing and Payment Systems ACI Worldwide mailto:[EMAIL PROTECTED] Phone: 780-424-4922
Re: IMAP and Netnews
Charles Lindsey wrote: In [EMAIL PROTECTED] Mark Crispin [EMAIL PROTECTED] writes: But clients that interoperate with IMAP usually also have the capability to interoperate with POP3, SMTP, NNTP and maybe even UUCP. I have never seen any suggestion that those other servers are in any way obligated to fix things that the client is unable to swallow/digest. All clients and servers of these protocols are required to comply. 8-bit headers are non-compliant. No, there is no requirement AFAIK for POP3 or SMTP servers to handle Netnews articles. They are designed to cope with Email messages and Usefor is taking care to ensure that they need never see anything else. If they do, then some standard has not been complied with. The use of the term requirement could be discussed at length, but you can't put your head in the sand and ignore the fact that new articles DO pass through POP3 (probably rarely), IMAP and SMTP. One of the big issues which keeps getting brought up by the usefor people is current practice. That's fine, but you can't just restrict this argument to current usenet practice. You have to consider the entire playing field, not just your corner of it. -- Kenneth Murchison Oceana Matrix Ltd. Software Engineer 21 Princeton Place 716-662-8973 x26 Orchard Park, NY 14127 --PGP Public Key--http://www.oceana.com/~ken/ksm.pgp
Re: Getting rid of the sequence numbers
On Fri, 2003-02-21 at 17:33, Simon Josefsson wrote: Not really, why would you _need_ to get a list of all messages? Client can request the messages from server only when they become visible in screen. Scrollbar sizes and such can be generated from just the total amount of messages. Before the message is loaded from server, client could just show loading .. instead of the from/to/subject/whatever. This assumes alot about the client that doesn't necessary hold. Not all clients generate INBOX summaries dynamically based on where the scrollbar is. Some clients doesn't even have a scroll bar. Right, they don't do it, but that doesn't mean that they couldn't. If there's no scrollbar, that's even easier to handle then since there's probably just next screenful and previous screenful (and maybe top/bottom). Generally I think it is more productive to stop regarding certain client behaviour, that is valid according to the specification, as broken or bad. I agree. If client doesn't work well for some specific use that user wants, there's always other clients that may work better for that purpose. Of course, my preferences may be biased by being an author of an IMAP client that wasn't designed around the scrollbar paradigm... If client was designed well enough, I don't think it should be too difficult to modify to support fetching only visible messages. Others could simply contain dummy information with not-fetched-yet flag on.
Re: Getting rid of the sequence numbers
Not really, why would you _need_ to get a list of all messages? Client can request the messages from server only when they become visible in screen. Scrollbar sizes and such can be generated from just the total amount of messages. Before the message is loaded from server, client could just show loading .. instead of the from/to/subject/whatever. if you have to score messages or sort messages (not using thread extension of IMAP ^_^ ), you have to fetch all the messages. -- DINH V. Hoa
Re: Getting rid of the sequence numbers
Hi Timo, --On Friday, February 21, 2003 7:34 AM +0200 Timo Sirainen [EMAIL PROTECTED] wrote: | Actually in some situations relying on sequence numbers could even lose | messages. Suppose a client (maybe a webmail) showing messages 1..10 on | screen. Next-button would load the next 10. If the IMAP connection got | closed before next-button was clicked and some of those messages were | expunged before connection was up again, fetching 10..20 would have | skipped over some of the unread messages. Well this is of course a bogus situation because no client can cache sequence numbers across different connections - they have to use UIDs for that. Most of the webmail solutions I know of that do not maintain persistent IMAP connections do use UIDs. This issue is in fact one of the major disadvantages to relying on sequence numbers given the propensity of network devices (e.g. firewalls, cable/dsl modems etc) to timeout idle connections at an interval less than the IMAP 30 minute timeout. Those devices pretty much force online clients to have to NOOP poll at a much shorter interval than they ought to to keep the connection alive. If the connection does die, the client is forced to effectively do a full resync of its cached state (or just throw it out and start over) if it attempts to recover the lost connection. This problem has been a major headache for me over the last few years. -- Cyrus Daboo
Re: Getting rid of the sequence numbers
On 21 Feb 2003 19:02:26 +0200, Timo Sirainen wrote: OK, I looked through c-client and Pine code. It looks just as difficult as I expected. It uses multiple arrays for seq - message lookups. Bullshit. There is one cache. Don't get confused by the sortcache which is not seq-message lookup. It often has to go through the whole array just to find the message (for every message it has to fetch I think?). That's only if you look up by UIDs, which a well-written client rarely has to do if you use sequence numbers effectively. I imagine that if I created a UID-seq hash table you would then be saying it has to maintain multiple tables to support sequence numbers. And of course it has to update the arrays every time messages are expunged. The only reason why it does that update is for a feature that is so advanced that most clients don't use it; a multithread client (such as my MailManager application) can lock a message cache entry and prevent it from being expunged even if it is expunged on the server. That way it doesn't have to maintain a separate copy, ever; but it does need a back pointer. I doubt very much that there's anything like that in your UID-only client. Pine also doesn't seem to do any automatic reconnection to server. I'd think that would get annoying with bad internet connections. It's rare to need to do automatic reconnection, even with flakey network connections, if you do networking the way you're supposed to. Such as not killing a perfectly good TCP connection because of the slightest router flap. Most of the need for automatic reconnection vanishes if you take the trouble not to sacrifice connections. CDPD is about as flakey as you can get, yet Pine works quite well with it. If it did do reconnecting, would the current code require resyncing everything after connect Since we never fetch everything, we never need to resync everything. Just toss out the local cache and do demand-fetching. Demand-fetching is good. Demand fetching means that you work on slow connections. Pine works well over CDPD. Does your client?
Re: Unicode newsgroup name options
D J Bernstein [EMAIL PROTECTED] writes: Actually, there's very little opposition (especially among implementors) to requiring all MTAs, MUAs, etc. to handle UTF-8 messages. Eventually we will all be using UTF-8; all relevant bugs must be fixed. Only the wildest ``7 bits forever!'' proponents, such as Keith Moore, disagree. The real controversy is over whether we should also do _other_ things before UTF-8 is working everywhere. For example, should we introduce some ad-hoc 7-bit character encoding for newsgroup names? Many of us (especially implementors) believe that these short-term 7-bit kludges have huge costs (as illustrated by your message) and miniscule benefits. We believe that the 7-bit kludges should be dropped. Our opponents are claiming that the IESG will demand a 7-bit solution. But they aren't opposing the requirement of UTF-8 support; they're opposing the reliance on UTF-8 as the sole solution. I'm not sure that I agree with your summary of the positions, but I'm certainly sympathetic to this viewpoint. Just implementing UTF-8 feels a lot cleaner to me too. However, my main interest personally is to get something published by the IETF documenting the Usenet article format that isn't as horribly obsolete and out of date as RFC 1036 is. I'm also most definitely not a mail system implementor or an IMAP implementor and don't know what issues implementors in those areas face. One additional option that I didn't mention would be to decide that the IETF standards process is out of touch with the reality of what implementors want and to then simply punt on specifying a non-ASCII encoding for newsgroups in the standard (so as not to get dragged into these arguments) and encourage anyone who wants to use a non-ASCII character set in practice to use UTF-8. This will work for most news server software. I personally don't have a sufficient grasp on the issues facing news client implementors, IMAP implementors, or mail system implementors to know whether that's a viable solution outside of news, and if so, whether it's the best solution available. -- Russ Allbery ([EMAIL PROTECTED]) http://www.eyrie.org/~eagle/
Re: Unicode newsgroup name options
On Fri, 21 Feb 2003, Russ Allbery wrote: D J Bernstein [EMAIL PROTECTED] writes: Actually, there's very little opposition (especially among implementors) to requiring all MTAs, MUAs, etc. to handle UTF-8 messages. Eventually we will all be using UTF-8; all relevant bugs must be fixed. Only the wildest ``7 bits forever!'' proponents, such as Keith Moore, disagree. I disagree, in the very strongest terms, with both this categorization of the situation and with the personal attack. Many of us (especially implementors) believe that these short-term 7-bit kludges have huge costs (as illustrated by your message) and miniscule benefits. We believe that the 7-bit kludges should be dropped. The definition of many and we is extremely subjective, particularly as it is a minority opinion. One additional option that I didn't mention would be to decide that the IETF standards process is out of touch with the reality of what implementors want The people who advocate this option are those individuals who have not been able to get their way. I strongly suggest that you plug your ears when you hear this siren song, lest you cause a disaster for yourself and wreak havoc on others. -- Mark -- http://staff.washington.edu/mrc Science does not emerge from voting, party politics, or public debate.
Re: Getting rid of the sequence numbers
On Fri, 21 Feb 2003, Timo Sirainen wrote: But whenever sorting is done, there is the sort array that has to be updated and accessed slowly whenever you get fetch envelope reply (pine_imap_envelope - mn_raw2m() - msgno_in_sort()). Wrong. What you are seeing in Pine is a mapping from a view. That is something that would have also to be done (and be much slower) in a UID client that had the same functionality. Also if it's sorted in any way. I guess sequences work well enough for unsorted mailboxes. They work better than UIDs for sorted or unsorted. Since we never fetch everything, we never need to resync everything. Just toss out the local cache and do demand-fetching. Meaning that you toss out a completely usable cache just because you don't want to use UIDs? Since we never fetch all that much into the cache, there isn't that much cost. Note too that on shared machines a cache that persists beyond sessions is a security bug. Finally, if the user uses multiple machines, then that's a lot of duplication of mail (including sensitive information). Persistant caching is only good if a user sticks to one or two dedicated machines. -- Mark -- http://staff.washington.edu/mrc Science does not emerge from voting, party politics, or public debate.
Re: mail vs. news ???
On Fri, 21 Feb 2003, Russ Allbery wrote: Usenet's restrictions on the syntax of message ID headers are very specific and very precise, and much stronger than those of RFC 2822, in part because message IDs are used as part of the NNTP protocol. What are those restrictions? Comments in various places that mail supports them are not well-supported by currently deployed Usenet software (although it certainly hurts nothing to support them when writing new code, other than adding complexity). The space after the colon in headers is not optional on Usenet. The syntax of the Date header is restricted in ways somewhat similar to that of the Message-ID header. Golly gee, where's the chorus of these are bugs that should be fixed now? First we hear the claim that 7-bit messaging restrictions in mail are a bug that should be fixed even though 7-bit was specifically in the standard. Now we hear the claim that completely unnecessary restictions in headers are necessary because of news software. And the IETF/IESG is supposed to respect this? - National 8-bit character sets are in widespread use in Usenet message headers, possibly more widespread than they are in (non-spam) mail messages. Untagged 8-bit national character sets are widely used in various non-English hierarchies in headers as the preferred way of including such content, and in some cases use of RFC 2047 is frowned on. This is because portions of the news community listened to the siren song of just send 8-bits offered by those individuals who song was rejected in mail. Now the news community has a non-interoperable disaster. But rather than fix the disaster, they seem to want to inflict a new disaster upon the email community. The solution to interoperability is to stop claiming that news is special, and start playing ball with the rest of the messaging world. This means making compromises, including at times accepting what seems to be unnecessary limitations, in order to achieve interoperability. -- Mark -- http://staff.washington.edu/mrc Science does not emerge from voting, party politics, or public debate.
Re: Unicode newsgroup name options
On Thu, 20 Feb 2003, Russ Allbery wrote: | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 -+-- A| D C N N D Y N N N Y N D Y D B| Y Y C Y Y N N N N N N N C D C| D C C C D N N N N N N N C D Choice A has three Ys, four Ds, and one C. Choice B has with four Ys, one D, and two Cs. Choice C, the ugly duckling, has no Ys, three Ds, and four Cs. Choice A has one fewer N than B or C. If you believe in the premise that a Y is more expensive than a D, and a D is more expensive than a C, then choice C (punycode everywhere) stands out as the preferable choice. The more expensive that you rate a Y compared to a D or C, the better than choice C looks. Now, let's factor out the items in which all three choices are equivalent, and the superiority of choice C becomes even more apparent. | 1 2 3 4 5 6 10 12 13 -+--- A| D C N N D YY D Y B| Y Y C Y Y NN N C C| D C C C D NN N C -- Mark -- http://staff.washington.edu/mrc Science does not emerge from voting, party politics, or public debate.
Re: Getting rid of the sequence numbers
On Fri, 21 Feb 2003, Timo Sirainen wrote: I don't know anyone who accesses their mail from more than a few computers. In a typical day, I use from three to five different computers to access my mail. So do my co-workers. I use IMAP only at home for accesssing my mails, elsewhere I just ssh into my server and read the mails there. Good and secure ssh clients are easier to find and setup than IMAP clients. Well, you stated your problem: you don't use a good IMAP client. I wouldn't think of ssh'ing into my server to read my mail. I happen to have that privilege as a server software maintainer, but most users can not. I especially would not think of ssh'ing into a server to read mail via CDPD. Shudder. Yet, with a good IMAP client such as Pine, I have a difficult time telling that I'm using CDPD instead of an 802.11b wireless, unless the message is large. Since I pay a flat rate for CDPD, I'll use it instead of 802.11b when travelling if the local access point charges for 802.11b. -- Mark -- http://staff.washington.edu/mrc Science does not emerge from voting, party politics, or public debate.
Re: mail vs. news ???
Mark Crispin [EMAIL PROTECTED] writes: On Fri, 21 Feb 2003, Russ Allbery wrote: Usenet's restrictions on the syntax of message ID headers are very specific and very precise, and much stronger than those of RFC 2822, in part because message IDs are used as part of the NNTP protocol. What are those restrictions? The primary ones are: * Absolutely no occurences of either whitespace or the character, escaped or not, are permitted inside the message ID. Either is known to break existing software in various ways. * Nothing is permitted in the Message-ID header other than the message ID itself. Comments either preceding or following the message ID will cause the message to be rejected by many news servers. * The message ID must not be longer than about 500 characters. The failure mode for violating this rule tends to be rather nasty for some existing NNTP software, including things like desynchronization of the protocol between the client and server. NNTP (unfortunately) has a maximum command length defined as part of the protocol. In practice, many news servers enforce a 250 octet limit (including the surrounding angle brackets). Please note that I'm not arguing that these restrictions are desirable, simply that violating them *will* break existing news software. I also don't think that fixing one and possibly two is really worth the effort, since there isn't much in the way of useful purpose served by not following those rules anyway. Comments in various places that mail supports them are not well-supported by currently deployed Usenet software (although it certainly hurts nothing to support them when writing new code, other than adding complexity). The space after the colon in headers is not optional on Usenet. The syntax of the Date header is restricted in ways somewhat similar to that of the Message-ID header. Golly gee, where's the chorus of these are bugs that should be fixed now? Are you expecting me to serve as the chorus? I certainly hope that you're not expecting me to try to be consistent with statements made by other people that I don't necessarily agree with. I tend to hold my own opinions and not necessarily agree with other people. :) First we hear the claim that 7-bit messaging restrictions in mail are a bug that should be fixed even though 7-bit was specifically in the standard. Now we hear the claim that completely unnecessary restictions in headers are necessary because of news software. These restrictions are published in RFC 1036, so I would not expect them to be a surprise to news implementors. Usenet has, since B news, used a subset of the mail messaging format. RFC 1036 is unfortunately imprecise about precisely what additional restrictions it put on the message format, but at the least the space after the colon in headers is quite explicit. The message ID restrictions are also fairly clear apart from the length limitation (which falls out of the NNTP protocol instead). (The bit in RFC 1036 about slashes being strongly discouraged in message IDs is now completely obsolete.) The Date specification in RFC 1036 is obnoxious, referring to a particular software implementation that isn't documented as part of the standard. In practice, an RFC 2822 date that doesn't use any of the obsolete syntax is fine provided that the header is not folded. Issues surrounding comments are more complex. Apart from Date and Message-ID, which are the most sensitive headers that are also shared with mail, comments in References headers are unlikely to cause catastrophic problems but may show up as oddities in the thread tree in a news reader and news software can be fairly picky about the From header (although one is likely fine as long as one avoids the obsolete syntax rules). And the IETF/IESG is supposed to respect this? My message was solely addressing the differences *in practice* that exist right now on the wire. I was not attempting to make any sort of statement about what the future should like. I personally am very strongly in favor of the unification of messaging formats, and think that this is one of the most important things that could come out of USEFOR. I think that it's reasonable to simply require that Usenet software going forward cope with comments in the References header and with the full From syntax in RFC 2822 (possibly omitting the obsolete rules, since they have never been supported on Usenet). I'm ambivalent about folded dates. The date parsing software that I've written personally and that is used in the software I maintain supports them. I don't understand why anyone would generate a folded date, though, so I can understand why people don't see what purpose is served in supporting it. I think that not requiring a space after the colon in headers (except for compatibility with older messages) is silly, but I don't have a strong opinion on it. Changing news software to support this can be a
Re: Getting rid of the sequence numbers
On Fri, 21 Feb 2003, Timo Sirainen wrote: I don't know anyone who accesses their mail from more than a few computers. I use IMAP only at home for accesssing my mails, elsewhere I just ssh into my server and read the mails there. Good and secure ssh clients are easier to find and setup than IMAP clients. Imagine an environment such as a university where there are many publicly accessible cluster machines. Or perhaps a corporation which doesn't assign specific cubicles and instead has a large number of nonspecific desktop machines. In such an environment, not only are different machines being used by many different users continously, but individual users are not limited to specific machines. (Of course, this has its own issues like ensuring preferences follow the users, but its a very real environment). -Rob -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Rob Siemborski | Andrew Systems Group * Research Systems Programmer PGP:0x5CE32FCC | Cyert Hall 207 * [EMAIL PROTECTED] * 412.268.7456 -BEGIN GEEK CODE BLOCK Version: 3.12 GCS/IT/CM/PA d- s+: a-- C$ ULS$ P+++$ L+++() E W+ N o? K- w O- M-- V-- PS+ PE++ Y+ PGP+ t+ 5+++ R tv- b+ DI+++ G e h r- y? --END GEEK CODE BLOCK-
Re: Unicode newsgroup name options
Mark Crispin [EMAIL PROTECTED] writes: Now, let's factor out the items in which all three choices are equivalent, and the superiority of choice C becomes even more apparent. | 1 2 3 4 5 6 10 12 13 -+--- A| D C N N D YY D Y B| Y Y C Y Y NN N C C| D C C C D NN N C That was the conclusion that was jumping out at me as well, but here are a few other things to keep in mind: * These columns don't really have equal weight, in that some of them represent small numbers of installations of relatively easily changed software and some of them represent very large installed bases or software that's difficult to change. In particular, (1) and (5), the installed base of news readers, will take a very long time to change, and (2), (3), and (4), the news server software, we know is rarely updated and upgrades are difficult to motivate. Usenet server software routinely runs for years on autopilot without any maintenance. By comparison, (6), the process that sends mail to moderated groups, is a small and easily changed component in most situations, and the number of moderators (10), news to mail gateways (12), and IMAP servers serving news messages (13) are all, while certainly significant, much smaller than the number of installations of the core Usenet software. * The single largest set of installed software, (1) and (5), is almost a C for proposal (A). We know that some existing software will work with UTF-8 newsgroup names out of the box without modification, although it will require some tweaking for ideal operation. By comparison, punycode (C) we know won't work correctly with *any* existing software; the only reason why that column is a D instead of Y is that users can use the funny-looking encoded names and still participate in the groups. This is one of the stronger arguments in favor of (A), namely that you can implement it to a surprising degree without changing any news software at all. * There are other issues not reflected on this matrix at all, such as complexity of implementation and compatibility with the existing messaging format, that weigh in various directions. Again, this is not to disagree with your conclusion. I just wanted to point out that while I found the table helpful, it's a bit over-simplified and hides the nature of some of the issues and tradeoffs. -- Russ Allbery ([EMAIL PROTECTED]) http://www.eyrie.org/~eagle/
Re: Getting rid of the sequence numbers
On Fri, 22 Feb 2003, Timo Sirainen wrote: Well, you stated your problem: you don't use a good IMAP client. That could be it. Installing and running would have to be as easy as sshing with putty though. Meaning you could get imapclient.exe from web page which you can run directly, only required configuration should be entering username, password and IMAP server host. That sounds like PC Pine. Actually it would also need SMTP server configuration unless it acts as such itself. Maybe IMAP server could give some SMTP configuration hints to client so user wouldn't have to set those manually. We don't have that, but we do have a way to have a remote Pine configuration so that a user can use Pine on multiple platforms and get your one true configuration each time. We even have a way to do NNTP proxy via IMAP, with minimal noticable difference between that and direct NNTP. That keeps your newsrc on the same place too. So on topic summary: I'm not against clients doing well optimized server fetches, but I don't think clients failing to do so are useless crap. In my experience, clients which make dumb mistakes such as: 1 UID FETCH 237 FLAGS * 3 FETCH (FLAGS (\Seen) UID 237) 1 OK done 2 UID FETCH * FLAGS * 4 FETCH (FLAGS (\Seen) UID 483) 2 OK done 4 UID FETCH 238:482 FLAGS 4 OK no messages there will do other stupid things as well. It gets worse. The same UID-only client that fails to realize that there can't be any UIDs between a UID with sequence 3 and a UID with sequence 4, also fails to grasp that the failure to find any UIDs in that range means that there won't ever be any in that range, and keeps on trying to find them. There are clients which do UID FETCH 1:* UID repeatedly in the same session. Some of these clients do it as a poll for new mail, since they disregard the EXISTS response. Then there are the clients which spawn connections for no good reason, but that's another story. This isn't not doing well-optimized server fetches. This is doing well-pessimized server fetches. And I still don't see how sequences would be inherently better for client to use than UIDs. If you don't use sequences, then each and every cache reference requires a lookup to locate the associated message for the UID in the cache. At best, this is a hash. With sequences, it's an index. If you use sequences, you know when you get an EXISTS precisely how many new messages there are (if in fact there are any). With UID-only, you have to do lastuid+1:* and woe be it to you if the server is one of the broken ones (like Courier) which incorrectly assumes that the left side of the : must be less than the right side. If you use sequences, you know when you get an EXPUNGE precisely which message was expunged (hence your request for an extra UIDEXPUNGE, which would burden all IMAP sessions with additional traffic -- remember that EXPUNGE can be unsolicited). If you use sequences, your commands, especially when the sets get large, will be much smaller than with UIDs. So will your SEARCH, SORT, and THREAD results. Last but not least, if you use sequences, you as a programmer are compelled to consider silly cases (such as I indicated above), and avoid doing them. You can't build a sequence between, but not including, 3 and 4; therefore you know that there's nothing to do. With UID-only, the silliness of what you may be doing is obscured from you, and your client ends up embarassing itself to server maintainers who run protocol traces to answer their customers' questions as to why is it so slow? Time and time again, I hear the advocates of UID-only claim that what they are doing is better or more efficient. Time and time again, when I see what the client does over the wire, these claims ring hollow. -- Mark -- http://staff.washington.edu/mrc Science does not emerge from voting, party politics, or public debate.
Re: mail vs. news ???
Ken Murchison [EMAIL PROTECTED] writes: OK. As I suspected there is nothing inherent in RFC [2]822 that makes it unsuitable for news. Spaces in message IDs make them unsuitable for news. This really, really does break things, honest, I swear. I'm not just making this up. :) In general, the statements in RFC 1036 putting additional limitations on the article format *are* implemented on Usenet and *are* relied on in practice by Usenet software. But, rewriting your statement to say there is nothing inherent in RFC 2822 with RFC 1036 limitations applied that makes it unsuitable for news, I agree. So, the claims that mail and news are not the same was either misleading and/or wishful thinking. That is quite certainly my opinion. -- Russ Allbery ([EMAIL PROTECTED]) http://www.eyrie.org/~eagle/
Re: Unicode newsgroup name options
The wildmat problem is a red herring. Wildmat implementations need to be cognizant of Unicode in far more substantial ways than merely overcoming punycode issues. A well-thought-out stringprep requirement will help some, but then the stringprep has to be implemented. This last point of Mark's is worth noting. Regardless of whether you end up with UTF-8, punycode, UTF-7, or some other encoding of Unicode, a stringprep profile is going to be required in order to move forward. It can either be specified directly or indirectly as part of the underlying encoding, but it has to be there. Part of the problem with having a group take so long to complete its work is that what's minimally acceptable to the IETF/IESG changes over time. It used to be that protocol could just use some encoding of Unicode, end of story. No more. Now that stringprep exists the IESG requires it be used when Unicode protocol elements are involved. Several recent specifications have been returned to their respective working groups because they didn't do this. I note in passing that while the current News Article Format Draft talks about normalization of group names, it does so without reference to stringprep. This would need to change: Stringprep covers stuff besides normalization, and more generally provides a checklist for Unicode usage each protocol needs to consider. I also note that Dan Kohn's alternative News Article Format draft uses punycode, which already includes a stringprep profile. Ned