On Fri, 2012-04-06 at 00:10 +0900, Stephen J. Turnbull wrote: > On Thu, Apr 5, 2012 at 10:41 PM, Pierre-Yves Chibon <pin...@pingoured.fr> > wrote: > > > In HyperKitty to be able to easily retrieve from the database all the > > threads of a given month or just all the emails of a thread, I created a > > Field in the database called ThreadID. > > When I load the archives from mailman into mongo, I look for the absence > > of the headers 'References' or 'In-Reply-To' to define an email that > > starts a new thread. > > This fails when a thread crosses channels. Eg, > > To: Pierre > From: Steve > Message-Id: <x@y.z> > > is followed by > > To: Steve > From: Pierre > Cc: SomeList > References: <x@y.z> > Message-Id: <a@b.c> > > > Would anyone have an idea on how to generate a stable and delete/reload > > proof ThreadID? > > I don't see how this can be possible. Eg, in the above scenario you > construct a thread based on your reply to me. Then I go, "oh, really > I should have posted to mm-dev" and repost the thread. So the > "Message-ID of root message" fails, and I don't see an alternative > that can be predicted. So it may as well be arbitrary (eg, any > message in the thread) and stored in the database with appropriate > linkage from thread IDs to message IDs (one-to-many), and vice versa > (many-to-one).
Ok, I missed a something here. So when it parses the email, it checks for 'References' or 'In-Reply-To'. - If it finds them, it looks for the preceding email - if it finds the preceding email, then the current email gets the ThreadID from the preceding email - if it does not find the preceding email, then the current email is assumed to be a new thread and thus its ThreadID is its Message-ID - if it does not find 'References' or 'In-Reply-To', then the current email is assumed to be a new thread and thus its ThreadID is its Message-ID So for the example you give, the archiver will receive your email and make a new thread out of it. > > The other solution of course being that I regenerate the thread on the > > fly based on the first email (which is still easy to find), but that > > will be a lot of db querying. > > I haven't thought about it deeply, but I would say just give the > thread an arbitrary ID in the database. Message-IDs are supposed to > universally unique, so what's wrong with keeping the thread in the > database as a tree of message IDs? Some Message-IDs will not have > corresponding messages but that's always a problem with threading (see > http://www.jwz.org/doc/threading.html, and RFC 5256). The idea of using the Message-ID for ThreadID (instead of a integer) is that, if I whether I load one months or two months of archives into the database, the link to the thread (http://mm3test.fedoraproject.org/thread/packaging@fp.o/XU7HT5JC5GND2O4JII7MTQILLTB4IN4S) will remain the same (so consistent urls). > There are other problems with threading that need to be dealt with as > well, such as References being inconsistent across messages in the > same thread and people who continue a thread with a new message, etc. For these I am not sure I can do something (at least automatically, we could always allow an admin to edit the field). Pierre _______________________________________________ Mailman-Developers mailing list Mailman-Developers@python.org http://mail.python.org/mailman/listinfo/mailman-developers Mailman FAQ: http://wiki.list.org/x/AgA3 Searchable Archives: http://www.mail-archive.com/mailman-developers%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-developers/archive%40jab.org Security Policy: http://wiki.list.org/x/QIA9