Dimid Duchovny <dim...@gmail.com> wrote:
> However, I realized that the last step (walking) is redundant,
> since that could be done by the library itself in the threading or
> ordering stages.

I think you want is best done in the storage/indexing stage;
whereas msgthr is intended for display/rendering results that
were retrieved from some sort of search engine.

At least thats how notmuch does it, and I stole the logic for
public-inbox(*) as they both use Xapian.  I think mairix does
something similar, too; but it's been a while...

> E.g. keeping track of each container's thread,
> and when adding a message A as a child of message B, to point A's
> thread to B's one.
> We could use an array with a single element,
> or some other solution to have pass-by-reference semantics.
> Finally, all top-level containers should have their own msg_id as the thread,
> and all their descendants will point to it as well.

One advantage to doing this in the storage phase is this info is
persistent and you don't need to calculate it every time.  This
is great when you're dealing with more message skeletons than
can fit in memory.  git@vger has over 300k messages, LKML will
have several million messages, and they both use String
Message-IDs (being email), so it'll be many hundreds of MB just
in containers and Message-IDs.

Another huge advantage in doing this when indexing a message
phase is you can easily search for something in a single
message and then easily pull every message from the thread it
belongs to based on a boolean thread_id search.  I also find
the "-t" switch of mairix being useful for my private mail.

I can help you understand how public-inbox does this in
SearchIdx.pm (indexer) and Search.pm (read-only queries) if
you're not familiar with Perl5, but for now you can grab the
code and try understanding it on your own:

        git clone https://public-inbox.org/public-inbox

http://repo.or.cz/public-inbox.git/blob/4f2f0eb94739edf:/lib/PublicInbox/SearchIdx.pm
http://repo.or.cz/public-inbox.git/blob/4f2f0eb94739edf:/lib/PublicInbox/Search.pm

I'll be happy to answer questions on m...@public-inbox.org
about it :)

> Would you consider adding such a feature? If so, I'll be happy to work
> out the details and submit a patch.

I'm not sure if it makes sense to add this without a stable
storage backend (Xapian or some other search indexer/DB).

Another potential problem is adding this to msgthr is msgthr is
GPL-2+ (since it's a port of Mail::Thread from CPAN); but the
notmuch algorithm is GPL-3+, so I'm not allowed to put it into
a GPL-2+ project (APGL-3+ is OK).

Maybe you can cite prior art from mairix (GPL-2+), but I haven't
looked at that code in many years and don't remember it.
--
unsubscribe: msgthr-public+unsubscr...@80x24.org
archive: https://80x24.org/msgthr-public/

Reply via email to