On Fri, Apr 6, 2012 at 3:42 AM, Pierre-Yves Chibon <pin...@pingoured.fr> wrote:
> So when it parses the email, it checks for 'References' or > 'In-Reply-To'. > - If it finds them, it looks for the preceding email > - if it finds the preceding email, then the current email gets the > ThreadID from the preceding email So far, so good. > - if it does not find the preceding email, then the current email is > assumed to be a new thread This is unacceptable. Mailing lists are not synchronous (eg, because of greylisting for one, but there are plenty of reasons why the mail doesn't always go through immediately). Threads must be able to integrate new messages as they arrive, even if out of order. > and thus its ThreadID is its Message-ID > - if it does not find 'References' or 'In-Reply-To', then the current > email is assumed to be a new thread and thus its ThreadID is its > Message-ID This isn't quite unacceptable, but it's clearly suboptimal. (Well-known algorithms that handle this case nicely are available.) > So for the example you give, the archiver will receive your email and > make a new thread out of it. That's an archiver that I won't use, and will strongly oppose as a candidate for the bundled archiver for Mailman (any version). >> I haven't thought about it deeply, but I would say just give the >> thread an arbitrary ID in the database. Message-IDs are supposed to >> universally unique, so what's wrong with keeping the thread in the >> database as a tree of message IDs? Some Message-IDs will not have >> corresponding messages but that's always a problem with threading (see >> http://www.jwz.org/doc/threading.html, and RFC 5256). > > The idea of using the Message-ID for ThreadID (instead of a integer) is > that, if I whether I load one months or two months of archives into the > database, the link to the thread > (http://mm3test.fedoraproject.org/thread/packaging@fp.o/XU7HT5JC5GND2O4JII7MTQILLTB4IN4S) > will remain the same (so consistent urls). Sure, but this is a matter of a persistent ID in the database. When I say "arbitrary" I don't mean you can't use a message ID to represent a thread if you like, I mean that you can't algorithmically compute it in a reliable, history-independent way. From the point of view of a user, you can't even be sure that a message without References or In-Reply-To is a thread root (users will note the subject and the content, and they will be displeased with any threading algorithm that doesn't at least group subjects). I don't say you need to implement that part of the JWZ/5256 algorithm immediately, but you must not use a database schema that makes it hard to add that feature later. In most cases, users will have access to a Message-ID for some message in the thread. So I would want an URL like http://lists.example.com/archive/some-list/thread/MessageID/root/ to find the thread root for any message in the thread, not just a particular representative of the the thread. (YMMV for the URL scheme, of course.) The last component of the URL path just gives the focus (message to actually display and/or highlight in a tree widget); other useful focuses might be "latest" (a message in the thread with the most recent Date or Received header) and "self" (the message itself is the focus). More speculative focuses would be "parent" (obvious, I hope) and "node" (the most recent ancestor message with multiple children). >> There are other problems with threading that need to be dealt with as >> well, such as References being inconsistent across messages in the >> same thread and people who continue a thread with a new message, etc. > > For these I am not sure I can do something (at least automatically, we > could always allow an admin to edit the field). You must do something about inconsistent References. Suppose there is a References loop? It needs to be broken, somehow, or your program will infloop. Anyway, this is all already taken care of in Jamie's algorithm. _______________________________________________ Mailman-Developers mailing list Mailman-Developers@python.org http://mail.python.org/mailman/listinfo/mailman-developers Mailman FAQ: http://wiki.list.org/x/AgA3 Searchable Archives: http://www.mail-archive.com/mailman-developers%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-developers/archive%40jab.org Security Policy: http://wiki.list.org/x/QIA9