Rob Herring <[email protected]> wrote:
> On Wed, Jun 29, 2022 at 11:27 AM Eric Wong <[email protected]> wrote:
> > Rob Herring <[email protected]> wrote:
> > > On Wed, Jun 29, 2022 at 10:30 AM Eric Wong <[email protected]> wrote:
> > > > Rob Herring <[email protected]> wrote:
> > > > > Hi,
> > > > >
> > > > > I'm using lei with lore where I have 2 queries which overlap. Really,
> > > > > one is a subset of the other. On those overlapping threads, I'm
> > > > > finding that sometimes new messages are written to one mailbox and not
> > > > > the other. (At least sometimes, the messages may be missing from all
> > > > > mailboxes sometimes too. I'm not certain.) Using --remote-fudge-time
> > > > > to force refetching seems to get the missing mails. I haven't found
> > > > > anything strange in timestamps of the missing mails, but otherwise am
> > > > > not sure how to debug this further. The queries are retrieving full
> > > > > threads and the missing mails are in the threads, but not direct
> > > > > matches to the queries. I realize that's not a lot of detail to go on.
> > > > > Suggestions on debugging this further?
> > > >
> > > > Is this with 1.8 or 1.7?
> > >
> > > Commit 68b53c888911 actually. So post 1.8.
> >
> > OK, thanks for that info.
> >
> > > > I forgot to note in the release notes, but there were some
> > > > SQLite usage-related fixes which could avoid missing messages.
> > > >
> > > > You'll need "lei daemon-kill" after upgrading to 1.8 to ensure
> > > > the new code is running.
> > >
> > > It's possible I haven't done that since updating though I do vaguely
> > > recall seeing something about needing to do that. Is there any way to
> > > tell before I restart it?
> >
> > Not really, but it's pretty cheap to restart (assuming there's no
> > long-running jobs).
> 
> I've restarted and just hit this again.

Ugh, sorry to hear that :<

> > > > What might be interesting is to use the URLs lei prints and
> > > > comparing the results w/o lei.
> 
> $ lei up --all
> # updating /home/rob/Mail/from-me
> # updating /home/rob/Mail/missing-cc
> # updating /home/rob/Mail/my-patches
> # updating /home/rob/Mail/pci
> # https://lore.kernel.org/all/ limiting to 2022-06-27 12:42 -0600 and newer
> # https://lore.kernel.org/all/ limiting to 2022-06-27  9:50 -0600 and newer
> # https://lore.kernel.org/all/ limiting to 2022-06-27 12:42 -0600 and newer
> # /usr/bin/curl -Sf -s -d ''
> https://lore.kernel.org/all/?x=m&t=1&q=(dt%3A20220529211430..+AND+(f%3Arobh%40kernel.org+OR+f%3Arobh%2Bdt%40kernel.org))+AND+dt%3A20220627184226..
> # /home/rob/.local/share/lei/store 144/144
> # /home/rob/.local/share/lei/store 3/3
> # /usr/bin/curl -Sf -s -d ''
> https://lore.kernel.org/all/?x=m&t=1&q=((dfn%3Adrivers+OR+dfn%3Aarch+OR+dfn%3ADocumentation%2F*+OR+dfn%3Ainclude+OR+dfn%3Ascripts)+AND+f%3Arobh%40kernel.org+AND+rt%3A1640812470..)+AND+dt%3A20220627155025..
> # /usr/bin/curl -Sf -s -d ''
> https://lore.kernel.org/all/?x=m&t=1&q=(l%3Alinux-pci+dfn%3Adrivers%2Fpci%2Fcontroller+dt%3A20220529211430..)+AND+dt%3A20220627184226..
> # /home/rob/.local/share/lei/store 0/0
> # /home/rob/.local/share/lei/store 362/362
> # 0 written to /home/rob/Mail/missing-cc/ (0 matches)
> # https://lore.kernel.org/all/ 72/72
> # https://lore.kernel.org/all/ 4/4
> # https://lore.kernel.org/all/ 131/?
> # https://lore.kernel.org/all/ 184/?
> # https://lore.kernel.org/all/ 412/?
> # https://lore.kernel.org/all/ 603/?
> # https://lore.kernel.org/all/ 853/?
> # https://lore.kernel.org/all/ 1069/?
> # https://lore.kernel.org/all/ 1442/?
> # https://lore.kernel.org/all/ 1443/1443
> # 1 written to /home/rob/Mail/pci/ (75 matches)
> # 2 written to /home/rob/Mail/my-patches/ (148 matches)
> # 7 written to /home/rob/Mail/from-me/ (1805 matches)
> 
> 
> What I expected was 3 messages written to 'my-patches'.
> 
> I think the problem is just simply that the new message missing
> doesn't match the query, but is a reply to a match. So with a date
> after the original match in the thread won't pick up anything. The 2nd
> URL above indeed only has 2 results. I guess I just have to fetch a
> wider window like a month every time? What's needed is a get any new
> messages in existing threads. I don't suppose there's an efficient way
> to do that?

No, I don't think so.  I think this is a separate issue in lei...
"t=1" in the remote query expands threads in a time-agnostic
way, so I don't think that's the problem (though I may be wrong...).

I'll have to check more closely this week (still stuck with POP3
user account/storage issues :<)

> > > >
> > > > I'll have to double-check if overlapping affects things, but it
> > > > shouldn't; since the dedupe logic is per-output.
> > > >
> > > > Is this exclusively with HTTPS endpoints and writing to Maildirs
> > > > (or something else?)
> > >
> > > Yes. It's querying lore and writing to a maildir. Here's one of the 
> > > queries:
> > >
> > > [lei]
> > >         q = (dfn:drivers OR dfn:arch OR dfn:Documentation/* OR
> > > dfn:include OR dfn:scripts) AND \
> > >          f:[email protected] AND rt:6.month.ago..
> > > [lei "q"]
> > >         include = https://lore.kernel.org/all/
> > >         external = 1
> > >         local = 1
> > >         remote = 1
> > >         threads = 1
> > >         dedupe = mid
> > >         output = maildir:/home/rob/Mail/my-patches
> >
> > Fwiw, dedupe based on mid could be vulnerable to spoofing, which
> > is why `content' is the default.  But yes, in the past, I've
> > noticed some messages to [email protected] not showing up,
> > though not recently (I guess lack of activity here is a culprit :x)
> 
> Does 'content' ignore trailers that mailman lists like to add? I think
> I switched because of that.

No, unfortunately not.  Hopefully the admins can be convinced to
get rid of trailers (I'm happy vger did so a few years back).
But I'd rather deal with duplicates than miss messages (there
have been legitimate messages in the past which reused msgids,
unfortunately).

Reply via email to