Konstantin Ryabitsev <[email protected]> wrote:
> On Mon, Mar 27, 2023 at 09:38:49PM +0000, Eric Wong wrote:
> > I thought about that, too; but I'm worried about having one-off
> > stuff that ends up needing to be supported indefinitely.
> >
> > JMAP for this would take more time, but I'd be more comfortable
> > carrying it long-term.
> >
> > I don't expect trimming after the first paragraph to be a huge
> > improvement. Retrieving any part of the message from git and
> > dealing with MIME is expensive, anyways. I wouldn't expect it
> > to be a big (if any) improvement compared to POST-ing for the
> > mbox.gz (&x=m&t=1) endpoint with rt:$SINCE..
>
> Hmm... This didn't seem to do the right thing for me. For example, this
> thread:
>
> https://lore.kernel.org/lkml/20230327080502.GA570847@ziqianlu-desk2
>
> If I ask for any new messages in that thread since 20230327120000, I get
> nothing:
>
> curl -Sf -d ''
> 'https://lore.kernel.org/all/?x=m&t=1&q=mid%3A20230327080502.GA570847@ziqianlu-desk2+AND+dt%3A20230328120000..'
Ugh, that's because the thread expansion (t=1) happens after
Xapian handles dt:/rt:/d:
I don't know if there's a good way to do that entirely within
Xapian via high-level Perl bindings.
Some options:
A) grab MSGID first, lookup THREADID for a given MSGID,
use remaining query
The problem is figuring out which parts of the query to
handle, first. Maybe a solution below...
B) add explicit before= and after= parameters which allow us
to do filtering ourselves in the thread expansion phase
C) index References:/In-Reply-To: so searching `ref:$MSGID'
can work. This doesn't work for some MUAs and deep
threads, though.
D) Support `thread:{subquery}' like notmuch.
Thus `thread:{mid:$MSGID} AND dt:$START..' would communicate
to Xapian what we want for A).
I'm not sure this is doable unless using Xapian via C++,
but I've been considering providing the option to use C++
anyways to support less hacky approxidate query parsing.
According to notmuch docs, it's expensive, though :<
I think it's possible to support /$INBOX/$MSGID/t.mbox.gz?q=...
for A) without too much difficulty. I'll have to think
about it a bit...
D) is good for long-term consideration if proper timeouts can
be implemented.
> > The mbox.gz endpoints should be a bit more efficient for the
> > server than Atom feeds; decoding MIME and HTML escaping takes up
> > considerable CPU time.
>
> Good to know. I'm really looking for a way to ask the remote system "hey, is
> there anything new in this thread?" so that I can quickly ignore threads
> without any updates.
All the mbox.gz endpoints will 404 if there's no results, and
the `-f' flag of curl will ensure nothing's emitted to stdout
in that case.