Here's what I've got so far. Comments would be appreciated. Bill
====================================================================== This module implements email threading per RFC 5256. It provides four classes: ThreadableObjectStore, MailboxStore, ReferencesSet, and OrderedSubjectSet. To use it, you need to provide it with a "mailstore", and a set of messages to thread. The mailstore must be a subclass of the abstract class ThreadableObjectStore; an implementation of a ThreadableObjectStore for mailbox.Mailbox is provided, as the class MailboxStore. Four methods must be implemented for a new ThreadableObjectStore subclass: tos_get_message_id(msg or message ID) => message ID where the message ID is an immutable value that must be unique in that ThreadableObjectStore context, and the msg can be whatever that ThreadableObjectStore considers a message. tos_get_subject(msg or message ID) => subject where the subject is the subject of the message, or None tos_get_date (msg or message ID) => timestamp where the timestamp is the date and time of the message, expressed as a standard Python time.time() value tos_get_references (msg or message ID) => sequence of message ID where the references are a sequence of message IDs, arranged in order as per RFC 5322. These message IDs must be in the same format as the message ID returned by tos_get_message_id(). The base ThreadableObjectStore class also provides a class method to compute the RFC 5256 "base subject": ThreadableObjectStore.tos_base_subject (subject text) => \ subject, is_reply_or_forward Takes a standard Subject: header value, and returns the "base subject" for it, along with a boolean flag indicating whether the supplied subject indicated a reply to or forward of the original subject To develop a set of threads, you then instantiate either ReferencesSet (the JWS algorithm from Netscape, formalized in RFC 5256), or OrderedSubjectSet (the "same subjects" algorithm, aka "poor man's threading"), both subclasses of the abstract class ThreadSet. Each constructor takes a ThreadableObjectStore instance and optionally a set of messages to use for the initial threads. If provided, those messages are analyzed into a set of threads. The threadset is iterable; the iteration is over the threads it contains. An instance of ThreadSet provides the following methods: add (msg or message ID) => thread add another message from the mailstore to the thread set, where "thread" is an object which has the attributes "message_id" (a string) and "children" (an ordered list of sub-threads), and is the root of the thread tree for that msg. remove (msg or message ID) => thread remove a message from the thread set, where thread is as for "add()", but may additionally be 'None' if the message was not in a thread, or was the only message in the thread. thread (msg or message ID) => thread obtain the thread containing the specified message, if any, where "thread" is as for "add()", or 'None' if no thread for that message exists. subject_threads (subject regexp) => set of thread obtain the threads where the base subject of the thread contains the specified regular expression, where "regexp" is a textual or compiled regular expression, and the return value is a set of threads. Note that subject comparisons are case-insensitive; compiled regexps must use the re.IGNORECASE flag. date_threads (starting time, ending time, root_only=False) => set of thread obtain the set of threads containing any messages between the two timestamps. Timestamps are time.time() timestamps; either may be specified as 'None' to mean either the start of time, or the distant future, respectively. If "root_only" is specified, will only consider the dates of the roots of each thread; threads with no root message (a subject forest) will always fail to match in this case. __contains__ (msg or message ID) => boolean Present to support the "in" operator. Support for persistence is provided with an instance method "to_external_form" and a class method "from_external_form" on thread sets. Calling "to_external_form" on a thread set instance will generate a set of tree structured nested tuples, where each tuple consists of an optional message ID followed by zero or more child tuples. ReferencesSet and OrderedSubjectSet also provide a class method, "from_external_form", which given a ThreadableObjectStore instance and an externalized thread set value, will create and return a new thread set instance initialized to that set of threads. MailboxStore is a subclass of ThreadableObjectStore designed to wrap mailboxes (subclasses of mailbox.Mailbox). For instance, >>> mbox = mailbox.Mbox("foo.mbox") >>> mboxstore = MailboxStore(mbox) >>> threadset = ReferencesSet (mboxstore, mbox.itervalues()) will produce a thread set for all the messages in the mbox-format mailbox 'foo.mbox', using the REFERENCES threading algorithm. MailboxStore also provides a static method to compute the normalized form of a message ID (the message ID stripped of <> angle brackets, and various quoted parts unquoted): MailboxStore.normalize_message_id(message ID) => message ID Take a standard RFC 5322 message ID string and return the normalized form of it. _______________________________________________ Email-SIG mailing list Email-SIG@python.org Your options: http://mail.python.org/mailman/options/email-sig/archive%40mail-archive.com