Re: [Mailman-Developers] From the creation of a ThreadID

2012-04-09 Thread Barry Warsaw
On Apr 09, 2012, at 07:10 AM, Richard Wackerbarth wrote:

I support the concept of Stable URI. The concept of using a hash into a large
namespace is probably adequate.  However, at a minimum, the URI SHOULD
include an easily identifiable schema-revision indicator.  That way, if the
present scheme is found lacking, we can, compatibly, switch to a new schema
and a new namespace.

Should we attempt to push the stable URI concept as an RFC?  Does anybody
(Murray perhaps) have the interest and time to do that?  I think the RFC would
be pretty simple.

Having an RFC would also be nice for getting rid of the X- prefix.

In any event, we can declare the algorithm on our current wiki page to be
version 1.0 of our stable URI definition.  Archiver search algorithms can
expose this version number in their URLs if they're so inclined.  E.g.:

http://mail.example.com/1.0/7GC2V6BEDVME27VQ34W7AXMFPA3H2YWW

I should probably also be able to find the message this way:

http://mail.example.com/search?message-id=%3C20120409152339.16496.75486%40foo.example.org%3E

and probably

http://mail.example.com/search?strict=1message-id=20120409152339.16496.75486%40foo.example.org

and maybe others.

-Barry
___
Mailman-Developers mailing list
Mailman-Developers@python.org
http://mail.python.org/mailman/listinfo/mailman-developers
Mailman FAQ: http://wiki.list.org/x/AgA3
Searchable Archives: 
http://www.mail-archive.com/mailman-developers%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-developers/archive%40jab.org

Security Policy: http://wiki.list.org/x/QIA9


Re: [Mailman-Developers] From the creation of a ThreadID

2012-04-09 Thread Stephen J. Turnbull
On Tue, Apr 10, 2012 at 12:28 AM, Barry Warsaw ba...@list.org wrote:
 On Apr 09, 2012, at 07:10 AM, Richard Wackerbarth wrote:

I support the concept of Stable URI.

 Should we attempt to push the stable URI concept as an RFC?  Does anybody
 (Murray perhaps) have the interest and time to do that?  I think the RFC would
 be pretty simple.

I don't think we have sufficient agreement on how to implement yet.

 Having an RFC would also be nice for getting rid of the X- prefix.

AIUI, the X- prefix is now considered a bad idea for public protocols
in any case.  I don't think we need an RFC for it until we're pretty
sure we have it right.

 In any event, we can declare the algorithm on our current wiki page to be
 version 1.0 of our stable URI definition.  Archiver search algorithms can
 expose this version number in their URLs if they're so inclined.

IMHO, our stable URIs should work on any of the servers we might
connect to to retrieve the message.  In terms of best current
practice, Gmane has offered stable URLs for about a decade now:

http://msgid.gmane.org/20120409152339.16496.75...@foo.example.org

To put it on the wire to Gmane, just URL-encode the message-id and be
done with it.  IMO, the ideal would be just like netnews:


list-archive://mailman-developers.python.org/20120409152339.16496.75...@foo.example.org

The List-ID is not entirely redundant due to cross-posting.

In this scheme, it's up to the MUA to decide which archive(s) to query
for this, just as with netnews looking for a newsgroup.  I really
don't see why the stable URI would want to be anything else.

So the scheme on the wiki seems overengineered to me, with the
possible exception of the industrial-strength message IDs are too
long for the footer problem.  But

http://mail.example.com/1.0/7GC2V6BEDVME27VQ34W7AXMFPA3H2YWW

is really too long for a footer too; what we want are tinyurls.  So I
think that footer URLs should be considered a different problem from
the stable URI problem.
___
Mailman-Developers mailing list
Mailman-Developers@python.org
http://mail.python.org/mailman/listinfo/mailman-developers
Mailman FAQ: http://wiki.list.org/x/AgA3
Searchable Archives: 
http://www.mail-archive.com/mailman-developers%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-developers/archive%40jab.org

Security Policy: http://wiki.list.org/x/QIA9


Re: [Mailman-Developers] From the creation of a ThreadID

2012-04-08 Thread Barry Warsaw
On Apr 05, 2012, at 05:29 PM, Terri Oda wrote:

I haven't read the whole thread so maybe someone else has mentioned this, but
we may want to take advantage of the dynamic sublists code for this, since it
produces conversations or topics sublists and already has to generate and
maintain a code for each.  Rather than messageids these are meant to be a bit
more human-readable, so they're often words with numbers suffixed.  But yeah;
there exists code for Mailman 2.1 that might be reusable here, and there's a
GSoC project on the table to port to 3.0 so this might be a thing that we
could pass to the archive utility.

Don't forget too that we have the Stable URL proposal, which turns arbitrary
Message-IDs into 32 upper-case ASCII letter and digit character base 32
hashes.

-Barry
___
Mailman-Developers mailing list
Mailman-Developers@python.org
http://mail.python.org/mailman/listinfo/mailman-developers
Mailman FAQ: http://wiki.list.org/x/AgA3
Searchable Archives: 
http://www.mail-archive.com/mailman-developers%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-developers/archive%40jab.org

Security Policy: http://wiki.list.org/x/QIA9


Re: [Mailman-Developers] From the creation of a ThreadID

2012-04-08 Thread Richard Wackerbarth
I would propose a slightly different scheme for converting messages to stable 
URIs..

If we create our ID by concatenation of some hash and a part of the date, then 
the mail server need remember only those messages that fall in the same 
date-sensitive part of the namespace. It can forget about ancient history.
Further, if we maintain sufficient Hamming distance, we can perform error 
correction (mapping multiple IDs to the same canonical one)) and, thus 
compensate for minor encoding differences caused by timing skew.


On Apr 8, 2012, at 12:38 PM, Barry Warsaw wrote:

 On Apr 05, 2012, at 05:29 PM, Terri Oda wrote:
 
 I haven't read the whole thread so maybe someone else has mentioned this, but
 we may want to take advantage of the dynamic sublists code for this, since it
 produces conversations or topics sublists and already has to generate and
 maintain a code for each.  Rather than messageids these are meant to be a bit
 more human-readable, so they're often words with numbers suffixed.  But yeah;
 there exists code for Mailman 2.1 that might be reusable here, and there's a
 GSoC project on the table to port to 3.0 so this might be a thing that we
 could pass to the archive utility.
 
 Don't forget too that we have the Stable URL proposal, which turns arbitrary
 Message-IDs into 32 upper-case ASCII letter and digit character base 32
 hashes.
 
 -Barry

___
Mailman-Developers mailing list
Mailman-Developers@python.org
http://mail.python.org/mailman/listinfo/mailman-developers
Mailman FAQ: http://wiki.list.org/x/AgA3
Searchable Archives: 
http://www.mail-archive.com/mailman-developers%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-developers/archive%40jab.org

Security Policy: http://wiki.list.org/x/QIA9


Re: [Mailman-Developers] From the creation of a ThreadID

2012-04-08 Thread Barry Warsaw
On Apr 07, 2012, at 10:53 PM, David Jeske wrote:

Perhaps I misunderstood. If you are going to have a record of the deletion
(i.e. you can keep a deleted message around in some form), this problem
becomes much easier. I thought this desire was to have stable urls and
threads when you rebuild and a message is missing.

Absolutly if there is a message 'deletion' feature, it should delete the
message contents but leave a 'stub' that links the message-id and
references/in-reply-to, so it can help hold the thread together during a
rebuild. My memory is foggy, but I think we used a technique like this in
Yahoo Groups.

I like the scheme outlined by Toshio where (IIRC) any message-id can be used
to index into its thread.  I also agree with David that a deletion should keep
enough of a stub around to maintain consistent thread links.  I think this is
also important for the end-user.

Imagine you've found a particular taken-down message through a search engine
cache.  You then follow the url.  I think it would be better to give them an
informative message about the take-down rather than just 404'ing the url
(although the latter or similar might also be useful for spiders so that they
know the message is no longer available).

Stephen observes that complete deletion is occasionally necessary.  While
true, I still think a placeholder/stub could be inserted to keep the thread
integrity whole.

Cheers,
-Barry
___
Mailman-Developers mailing list
Mailman-Developers@python.org
http://mail.python.org/mailman/listinfo/mailman-developers
Mailman FAQ: http://wiki.list.org/x/AgA3
Searchable Archives: 
http://www.mail-archive.com/mailman-developers%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-developers/archive%40jab.org

Security Policy: http://wiki.list.org/x/QIA9


Re: [Mailman-Developers] From the creation of a ThreadID

2012-04-08 Thread Barry Warsaw
On Apr 08, 2012, at 01:11 PM, Richard Wackerbarth wrote:

I would propose a slightly different scheme for converting messages to stable
URIs..

If we create our ID by concatenation of some hash and a part of the date,
then the mail server need remember only those messages that fall in the same
date-sensitive part of the namespace. It can forget about ancient history.

Hi Richard,

We had a very lengthy discussion about the hash a year or so ago, when the
current algorithm was agreed upon.  I'm too swamped at the moment to dig up
the links, but look for input from Jeff Breidenbach and Jeff Marshall.

The conclusion was that Message-ID was both sufficient and preferable as the
sole input into the X-Message-ID-Hash value used for stable URLs.

Of course date information could certainly be used to determine expiration
from any kind of Message-ID cache for LMTP acceptance purposes.  It doesn't
have to be part of the hash input for that.

Note though that Mailman has long had a feature to clobber the Date header
when forwarding the message on to the archive.  In mm2.1 this was closely tied
to Pipermail, but in mm3 this can be enabled for any archiver.  The problem
was that Date headers can get skewed enough that it would cause threading
problems in Pipermail.  It's probably true that most bogus Date headers come
from spam (trying to get their message at the top or bottom of my date sorted
inbox summary).

Further, if we maintain sufficient Hamming distance, we can perform error
correction (mapping multiple IDs to the same canonical one)) and, thus
compensate for minor encoding differences caused by timing skew.

Hmm, I'm having trouble seeing how useful this would be if the Date is not
used to calculate the stable url.

-Barry


signature.asc
Description: PGP signature
___
Mailman-Developers mailing list
Mailman-Developers@python.org
http://mail.python.org/mailman/listinfo/mailman-developers
Mailman FAQ: http://wiki.list.org/x/AgA3
Searchable Archives: 
http://www.mail-archive.com/mailman-developers%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-developers/archive%40jab.org

Security Policy: http://wiki.list.org/x/QIA9

Re: [Mailman-Developers] From the creation of a ThreadID

2012-04-07 Thread Stephen J. Turnbull
Bill Janssen has one too, I forget it it's based on amk's or not, but
it's current.  See thread in email-sig:

http://mail.python.org/pipermail/email-sig/2012-January/000882.html

On Sat, Apr 7, 2012 at 2:54 AM, Toshio Kuratomi a.bad...@gmail.com wrote:
 On Fri, Apr 06, 2012 at 12:10:22AM +0900, Stephen J. Turnbull wrote:
 Some Message-IDs will not have
 corresponding messages but that's always a problem with threading (see
 http://www.jwz.org/doc/threading.html, and RFC 5256).

 There are other problems with threading that need to be dealt with as
 well, such as References being inconsistent across messages in the
 same thread and people who continue a thread with a new message, etc.

 Looks like amk coded jqz's algorithm into a python library too:
  https://github.com/akuchling/jwzthreading

 All other links to that code that I found (on amk.ca and bitbucket) were
 broken so someone may want to clone that/ask andrew what's going on with it
 :-)

 -Toshio

 ___
 Mailman-Developers mailing list
 Mailman-Developers@python.org
 http://mail.python.org/mailman/listinfo/mailman-developers
 Mailman FAQ: http://wiki.list.org/x/AgA3
 Searchable Archives: 
 http://www.mail-archive.com/mailman-developers%40python.org/
 Unsubscribe: 
 http://mail.python.org/mailman/options/mailman-developers/stephen%40xemacs.org

 Security Policy: http://wiki.list.org/x/QIA9
___
Mailman-Developers mailing list
Mailman-Developers@python.org
http://mail.python.org/mailman/listinfo/mailman-developers
Mailman FAQ: http://wiki.list.org/x/AgA3
Searchable Archives: 
http://www.mail-archive.com/mailman-developers%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-developers/archive%40jab.org

Security Policy: http://wiki.list.org/x/QIA9


Re: [Mailman-Developers] From the creation of a ThreadID

2012-04-07 Thread David Jeske
On Apr 6, 2012 10:49 AM, Toshio Kuratomi a.bad...@gmail.com wrote:
  1) don't publish thread-ids, but just message-ids... for example, a
thread
  URL could be allowed to reference the message-id of 'any' message in the
  thread They could then include more than one message-id, making them
  resiliant to a lost messageid later. if some messageid are lost,
hopefully
  a url someone is holding onto has another messageid that was not lost.
 
 This sounds good.  So instead of relying on the first message-id of the
thread
 we internally keep a mapping of all message-ids and stableurl hashes to
 either an internal message-id or a tree of messages in the thread.

I think of this as keeping a mapping from rfc822 message-id to internal
thread-id. I think you are using different words to say the same thing.

 When deleting messages, always retain the message-id and stableurl hashes
 for that message in the mapping.  That way a url that pointed to the
thread
 by that message-id will continue to function even though the message
itself
 has been deleted.

Perhaps I misunderstood. If you are going to have a record of the deletion
(i.e. you can keep a deleted message around in some form), this problem
becomes much easier. I thought this desire was to have stable urls and
threads when you rebuild and a message is missing.

Absolutly if there is a message 'deletion' feature, it should delete the
message contents but leave a 'stub' that links the message-id and
references/in-reply-to, so it can help hold the thread together during a
rebuild. My memory is foggy, but I think we used a technique like this in
Yahoo Groups.
___
Mailman-Developers mailing list
Mailman-Developers@python.org
http://mail.python.org/mailman/listinfo/mailman-developers
Mailman FAQ: http://wiki.list.org/x/AgA3
Searchable Archives: 
http://www.mail-archive.com/mailman-developers%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-developers/archive%40jab.org

Security Policy: http://wiki.list.org/x/QIA9


Re: [Mailman-Developers] From the creation of a ThreadID

2012-04-06 Thread David Jeske
On Apr 5, 2012 6:42 AM, Pierre-Yves Chibon pin...@pingoured.fr wrote:
 So I changed to use the Message-ID of the first email of the Thread as
ThreadID.
 Problem is of course, if the admin removes the first email of a thread
 for x or y reasons, then when reloading the archives (for z or a
 reasons), we will loose the ThreadID and actually, the integrity of the
 Thread (each reply to the first email will be split into their own
 thread).

 Would anyone have an idea on how to generate a stable and delete/reload
proof ThreadID?

I believe deletion proof (i.e. stable thread-ids in the case of arbitrary
deletions) may be provably not possible.

If you really want to be resiliant to arbitrary deletions/reloads, I think
your solution is ultimately going to involve referencing more than one
message in thread URLs..

For example, here is a scheme where 'messages in the thread name the
thread':

1) don't publish thread-ids, but just message-ids... for example, a thread
URL could be allowed to reference the message-id of 'any' message in the
thread They could then include more than one message-id, making them
resiliant to a lost messageid later. if some messageid are lost, hopefully
a url someone is holding onto has another messageid that was not lost.

As for how to pick the message-ids, paged display could include a messageid
for a message on the page, in addition to the 'first' messageid of the
thread.

2) create an 'internal only threadid' which you use to correlate messages
together into a thread. (don't show this to anyone) you could generate this
as a GUID, Hash, or the message-id of the message..doesn't matter, since
nobody will see it...

3) when indexing messages, search in both directions
(references/in-reply-to - messageid, and vice-versa) to find out if the
message belongs in a thread.. if it does, then adopt the 'internal thread
id'.. if you find two different threadids in the two directions, then
rewrite/combine into a single internal-thread-id

- urls can be somewhat resiliant of deleted/missing messages within a
thread... and completely resilient to changes in other threads
- threads can be manually edited and merged/split after the fact, with
some level of success
- could be designed to 'break down' threads that get too big, again with
minimal damage, and some url compatibility
___
Mailman-Developers mailing list
Mailman-Developers@python.org
http://mail.python.org/mailman/listinfo/mailman-developers
Mailman FAQ: http://wiki.list.org/x/AgA3
Searchable Archives: 
http://www.mail-archive.com/mailman-developers%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-developers/archive%40jab.org

Security Policy: http://wiki.list.org/x/QIA9


Re: [Mailman-Developers] From the creation of a ThreadID

2012-04-06 Thread Toshio Kuratomi
On Fri, Apr 06, 2012 at 12:00:49AM -0700, David Jeske wrote:
 On Apr 5, 2012 6:42 AM, Pierre-Yves Chibon pin...@pingoured.fr wrote:
  So I changed to use the Message-ID of the first email of the Thread as
 ThreadID.
  Problem is of course, if the admin removes the first email of a thread
  for x or y reasons, then when reloading the archives (for z or a
  reasons), we will loose the ThreadID and actually, the integrity of the
  Thread (each reply to the first email will be split into their own
  thread).
 
  Would anyone have an idea on how to generate a stable and delete/reload
 proof ThreadID?
 
 I believe deletion proof (i.e. stable thread-ids in the case of arbitrary
 deletions) may be provably not possible.
 
 If you really want to be resiliant to arbitrary deletions/reloads, I think
 your solution is ultimately going to involve referencing more than one
 message in thread URLs..
 
I don't see any way to make this 100% resilient against deletion + reload
(where reload == from the available messages without the benefit of the old
metadata) either.  I think with slight modification to your steps below, we
can get to resiliency against deletion or resiliency against total reload.

 For example, here is a scheme where 'messages in the thread name the
 thread':
 
 1) don't publish thread-ids, but just message-ids... for example, a thread
 URL could be allowed to reference the message-id of 'any' message in the
 thread They could then include more than one message-id, making them
 resiliant to a lost messageid later. if some messageid are lost, hopefully
 a url someone is holding onto has another messageid that was not lost.
 
This sounds good.  So instead of relying on the first message-id of the thread
we internally keep a mapping of all message-ids and stableurl hashes to
either an internal message-id or a tree of messages in the thread.

When deleting messages, always retain the message-id and stableurl hashes
for that message in the mapping.  That way a url that pointed to the thread
by that message-id will continue to function even though the message itself
has been deleted.

-Toshio


pgpJxgy9dRi07.pgp
Description: PGP signature
___
Mailman-Developers mailing list
Mailman-Developers@python.org
http://mail.python.org/mailman/listinfo/mailman-developers
Mailman FAQ: http://wiki.list.org/x/AgA3
Searchable Archives: 
http://www.mail-archive.com/mailman-developers%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-developers/archive%40jab.org

Security Policy: http://wiki.list.org/x/QIA9

Re: [Mailman-Developers] From the creation of a ThreadID

2012-04-06 Thread Toshio Kuratomi
On Fri, Apr 06, 2012 at 12:10:22AM +0900, Stephen J. Turnbull wrote:
 Some Message-IDs will not have
 corresponding messages but that's always a problem with threading (see
 http://www.jwz.org/doc/threading.html, and RFC 5256).
 
 There are other problems with threading that need to be dealt with as
 well, such as References being inconsistent across messages in the
 same thread and people who continue a thread with a new message, etc.

Looks like amk coded jqz's algorithm into a python library too:
  https://github.com/akuchling/jwzthreading

All other links to that code that I found (on amk.ca and bitbucket) were
broken so someone may want to clone that/ask andrew what's going on with it
:-)

-Toshio


pgpSBTrfRwSQI.pgp
Description: PGP signature
___
Mailman-Developers mailing list
Mailman-Developers@python.org
http://mail.python.org/mailman/listinfo/mailman-developers
Mailman FAQ: http://wiki.list.org/x/AgA3
Searchable Archives: 
http://www.mail-archive.com/mailman-developers%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-developers/archive%40jab.org

Security Policy: http://wiki.list.org/x/QIA9

Re: [Mailman-Developers] From the creation of a ThreadID

2012-04-05 Thread Stephen J. Turnbull
On Thu, Apr 5, 2012 at 10:41 PM, Pierre-Yves Chibon pin...@pingoured.fr wrote:

 In HyperKitty to be able to easily retrieve from the database all the
 threads of a given month or just all the emails of a thread, I created a
 Field in the database called ThreadID.
 When I load the archives from mailman into mongo, I look for the absence
 of the headers 'References' or 'In-Reply-To' to define an email that
 starts a new thread.

This fails when a thread crosses channels.  Eg,

To: Pierre
From: Steve
Message-Id: x@y.z

is followed by

To: Steve
From: Pierre
Cc: SomeList
References: x@y.z
Message-Id: a@b.c

 Would anyone have an idea on how to generate a stable and delete/reload
 proof ThreadID?

I don't see how this can be possible.  Eg, in the above scenario you
construct a thread based on your reply to me.  Then I go, oh, really
I should have posted to mm-dev and repost the thread.  So the
Message-ID of root message fails, and I don't see an alternative
that can be predicted.  So it may as well be arbitrary (eg, any
message in the thread) and stored in the database with appropriate
linkage from thread IDs to message IDs (one-to-many), and vice versa
(many-to-one).

 The other solution of course being that I regenerate the thread on the
 fly based on the first email (which is still easy to find), but that
 will be a lot of db querying.

I haven't thought about it deeply, but I would say just give the
thread an arbitrary ID in the database.  Message-IDs are supposed to
universally unique, so what's wrong with keeping the thread in the
database as a tree of message IDs?  Some Message-IDs will not have
corresponding messages but that's always a problem with threading (see
http://www.jwz.org/doc/threading.html, and RFC 5256).

There are other problems with threading that need to be dealt with as
well, such as References being inconsistent across messages in the
same thread and people who continue a thread with a new message, etc.
___
Mailman-Developers mailing list
Mailman-Developers@python.org
http://mail.python.org/mailman/listinfo/mailman-developers
Mailman FAQ: http://wiki.list.org/x/AgA3
Searchable Archives: 
http://www.mail-archive.com/mailman-developers%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-developers/archive%40jab.org

Security Policy: http://wiki.list.org/x/QIA9


Re: [Mailman-Developers] From the creation of a ThreadID

2012-04-05 Thread Richard Wackerbarth
I agree with Steve. In general you cannot solve the problem with only the 
information contained in the message headers. You will need to maintain 
parallel meta-data for each message/thread. The header info would provide an 
initialization at the time of insertion. Presumedly this thread tree could be 
edited by an administrator to correct broken chains, etc.

On Apr 5, 2012, at 10:10 AM, Stephen J. Turnbull wrote:

 On Thu, Apr 5, 2012 at 10:41 PM, Pierre-Yves Chibon pin...@pingoured.fr 
 wrote:
 
 In HyperKitty to be able to easily retrieve from the database all the
 threads of a given month or just all the emails of a thread, I created a
 Field in the database called ThreadID.
 When I load the archives from mailman into mongo, I look for the absence
 of the headers 'References' or 'In-Reply-To' to define an email that
 starts a new thread.
 
 This fails when a thread crosses channels.  Eg,
 
 To: Pierre
 From: Steve
 Message-Id: x@y.z
 
 is followed by
 
 To: Steve
 From: Pierre
 Cc: SomeList
 References: x@y.z
 Message-Id: a@b.c


 I haven't thought about it deeply, but I would say just give the
 thread an arbitrary ID in the database.  Message-IDs are supposed to
 universally unique, so what's wrong with keeping the thread in the
 database as a tree of message IDs?  Some Message-IDs will not have
 corresponding messages but that's always a problem with threading (see
 http://www.jwz.org/doc/threading.html, and RFC 5256).
 
 There are other problems with threading that need to be dealt with as
 well, such as References being inconsistent across messages in the
 same thread and people who continue a thread with a new message, etc.

___
Mailman-Developers mailing list
Mailman-Developers@python.org
http://mail.python.org/mailman/listinfo/mailman-developers
Mailman FAQ: http://wiki.list.org/x/AgA3
Searchable Archives: 
http://www.mail-archive.com/mailman-developers%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-developers/archive%40jab.org

Security Policy: http://wiki.list.org/x/QIA9


Re: [Mailman-Developers] From the creation of a ThreadID

2012-04-05 Thread Pierre-Yves Chibon
On Fri, 2012-04-06 at 00:10 +0900, Stephen J. Turnbull wrote:
 On Thu, Apr 5, 2012 at 10:41 PM, Pierre-Yves Chibon pin...@pingoured.fr 
 wrote:
 
  In HyperKitty to be able to easily retrieve from the database all the
  threads of a given month or just all the emails of a thread, I created a
  Field in the database called ThreadID.
  When I load the archives from mailman into mongo, I look for the absence
  of the headers 'References' or 'In-Reply-To' to define an email that
  starts a new thread.
 
 This fails when a thread crosses channels.  Eg,
 
 To: Pierre
 From: Steve
 Message-Id: x@y.z
 
 is followed by
 
 To: Steve
 From: Pierre
 Cc: SomeList
 References: x@y.z
 Message-Id: a@b.c
 
  Would anyone have an idea on how to generate a stable and delete/reload
  proof ThreadID?
 
 I don't see how this can be possible.  Eg, in the above scenario you
 construct a thread based on your reply to me.  Then I go, oh, really
 I should have posted to mm-dev and repost the thread.  So the
 Message-ID of root message fails, and I don't see an alternative
 that can be predicted.  So it may as well be arbitrary (eg, any
 message in the thread) and stored in the database with appropriate
 linkage from thread IDs to message IDs (one-to-many), and vice versa
 (many-to-one).

Ok, I missed a something here.
So when it parses the email, it checks for 'References' or
'In-Reply-To'.
- If it finds them, it looks for the preceding email
- if it finds the preceding email, then the current email gets the
ThreadID from the preceding email
- if it does not find the preceding email, then the current email is
assumed to be a new thread and thus its ThreadID is its Message-ID
- if it does not find 'References' or 'In-Reply-To', then the current
email is assumed to be a new thread and thus its ThreadID is its
Message-ID

So for the example you give, the archiver will receive your email and
make a new thread out of it.

  The other solution of course being that I regenerate the thread on the
  fly based on the first email (which is still easy to find), but that
  will be a lot of db querying.
 
 I haven't thought about it deeply, but I would say just give the
 thread an arbitrary ID in the database.  Message-IDs are supposed to
 universally unique, so what's wrong with keeping the thread in the
 database as a tree of message IDs?  Some Message-IDs will not have
 corresponding messages but that's always a problem with threading (see
 http://www.jwz.org/doc/threading.html, and RFC 5256).

The idea of using the Message-ID for ThreadID (instead of a integer) is
that, if I whether I load one months or two months of archives into the
database, the link to the thread
(http://mm3test.fedoraproject.org/thread/packaging@fp.o/XU7HT5JC5GND2O4JII7MTQILLTB4IN4S)
 will remain the same (so consistent urls).

 There are other problems with threading that need to be dealt with as
 well, such as References being inconsistent across messages in the
 same thread and people who continue a thread with a new message, etc.

For these I am not sure I can do something (at least automatically, we
could always allow an admin to edit the field).

Pierre

___
Mailman-Developers mailing list
Mailman-Developers@python.org
http://mail.python.org/mailman/listinfo/mailman-developers
Mailman FAQ: http://wiki.list.org/x/AgA3
Searchable Archives: 
http://www.mail-archive.com/mailman-developers%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-developers/archive%40jab.org

Security Policy: http://wiki.list.org/x/QIA9


Re: [Mailman-Developers] From the creation of a ThreadID

2012-04-05 Thread Richard Wackerbarth
Pierre,

There is nothing wrong with using a message ID as a thread ID. They are 
different namespaces (with an intuitive mapping for the first post.)

The problem is only that the mapping is not stable under the restore after 
deleting some messages scenario.
If you expect to be able to restore messages and keep stable thread IDs, then 
you will need to assure that the mapping of message to thread ID does not 
depend on the presence of other messages remaining in the database.

Richard

On Apr 5, 2012, at 1:42 PM, Pierre-Yves Chibon wrote:

 The idea of using the Message-ID for ThreadID (instead of a integer) is
 that, if I whether I load one months or two months of archives into the
 database, the link to the thread
 (http://mm3test.fedoraproject.org/thread/packaging@fp.o/XU7HT5JC5GND2O4JII7MTQILLTB4IN4S)
  will remain the same (so consistent urls).
 
 There are other problems with threading that need to be dealt with as
 well, such as References being inconsistent across messages in the
 same thread and people who continue a thread with a new message, etc.
 
 For these I am not sure I can do something (at least automatically, we
 could always allow an admin to edit the field).
 
 Pierre
 
 ___
 Mailman-Developers mailing list
 Mailman-Developers@python.org
 http://mail.python.org/mailman/listinfo/mailman-developers
 Mailman FAQ: http://wiki.list.org/x/AgA3
 Searchable Archives: 
 http://www.mail-archive.com/mailman-developers%40python.org/
 Unsubscribe: 
 http://mail.python.org/mailman/options/mailman-developers/richard%40nfsnet.org
 
 Security Policy: http://wiki.list.org/x/QIA9

___
Mailman-Developers mailing list
Mailman-Developers@python.org
http://mail.python.org/mailman/listinfo/mailman-developers
Mailman FAQ: http://wiki.list.org/x/AgA3
Searchable Archives: 
http://www.mail-archive.com/mailman-developers%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-developers/archive%40jab.org

Security Policy: http://wiki.list.org/x/QIA9


Re: [Mailman-Developers] From the creation of a ThreadID

2012-04-05 Thread Mark Sapiro
Pierre-Yves Chibon wrote:

Ok, I missed a something here.
So when it parses the email, it checks for 'References' or
'In-Reply-To'.
- If it finds them, it looks for the preceding email
- if it finds the preceding email, then the current email gets the
ThreadID from the preceding email
- if it does not find the preceding email, then the current email is
assumed to be a new thread and thus its ThreadID is its Message-ID
- if it does not find 'References' or 'In-Reply-To', then the current
email is assumed to be a new thread and thus its ThreadID is its
Message-ID


This is still incomplete. One of the MUAs I use generates In-Reply-To:
headers but not References: headers. Thus in cases where someone has
replied to me but not included the list (and may or may not have
subsequently sent the reply to the list with a different Message-ID),
and I reply and include the list, the Message-ID in my In-Reply-To: is
not in the archive.

Another situation is someone replies to me and the list, but the list
reply is greylisted and not retried for a while. Meanwhile, I reply to
my copy and the Message-ID in my In-Reply-To: is not yet in the
archive.

Threading is not easy.

-- 
Mark Sapiro m...@msapiro.netThe highway is for gamblers,
San Francisco Bay Area, Californiabetter use your sense - B. Dylan

___
Mailman-Developers mailing list
Mailman-Developers@python.org
http://mail.python.org/mailman/listinfo/mailman-developers
Mailman FAQ: http://wiki.list.org/x/AgA3
Searchable Archives: 
http://www.mail-archive.com/mailman-developers%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-developers/archive%40jab.org

Security Policy: http://wiki.list.org/x/QIA9