On 6/27/07, Chris Abraham <[EMAIL PROTECTED]> wrote:
Hi,
We've noticed a problem with the way listen is threading messages in the
archive. Take a look at this thread:
http://www.openplans.org/projects/astor-playa/lists/astor-playa-project/archive/2007/06/1180965651338/forum_view
What probably happened was people, out of convenience, initiated a new
message by replying to an unrelated message from the list and then
changing the subject line and deleting the body. Since listen threads
messages in the archive using hidden "in-reply-to" and "references"
fields, these improperly caused this new message to be threaded with the
existing thread.
Gmail handles threading differently. From what we can tell, it does an
intelligent match on the subject lines of messages to create
"conversations." In the case in which subject lines match in messages
that are sent months (or some period of time) apart, it assumes they are
independent "conversations." I could see this causing problems in
conversations where someone legitimately wants to modify the subject
line without breaking the thread.
So, learning from gmail, we may want to take steps to improve on
listen's threading behavior. We should probably spend more time
studying gmail's (and other) threading behavior and come up with a new
specification and thus avoid this rare but odd situation we are seeing
in the astor-playa list.
Unless you guys know exactly what gmail does, I wouldn't necessarily
use your impressions of it as a basis. Threading is a pretty hard
problem. Your best bet is probably to go with a known good threading
algorithm and try to make it work with our indexing/archiving. People
alter/change subjects all the time on mailing lists, so using a
combination of subject and response headers is a good idea. Also,
it's pretty bad form (though nonetheless common) to reply to a message
in order to start a new thread, and many many mail apps treat such
responses as part of the same thread.
In any case the threading mechanism of listen could certainly use some
rethinking as it's pretty basic. My suggestion is to look at jwz's
mail threading algorithm. There's even a python implementation:
http://www.amk.ca/python/code/jwz
The algorithm seems to be a pretty smart combination of analyzing
headers and subject line, though I'm not quite sure it will handle the
case you're talking about any better than the current listen
mechanism.
Good luck,
Alec
--
Archive:
http://www.openplans.org/projects/listen/lists/listen-dev/archive/2007/06/1182989059782
To unsubscribe send an email with subject unsubscribe to [EMAIL PROTECTED]
Please contact [EMAIL PROTECTED] for questions.