Konstantin Ryabitsev <mri...@kernel.org> wrote:
> On Sat, Apr 27, 2024 at 07:19:21AM GMT, Eric Wong wrote:
> > Correct, public-inbox currently won't index every header due to
> > cost, false positives, and otherwise lack of usefulness (general
> > gibberish from DKIM sigs, various UUIDs, etc).
> > 
> > So it doesn't currently know about "X-stable:"
> > 
> > I started working on making headers indexing configurable last
> > year, but didn't hear a response from the person that
> > potentially was interested:
> > 
> > https://public-inbox.org/meta/20231120032132.M610564@dcvr/
> > 
> > Right now, indexing new headers + validations can be maintained
> > as a Perl module in the public-inbox codebase.
> > 
> > For lore, it'd make sense to be able to configure a bunch (or
> > all) inboxes at once instead of the per-inbox configuration in
> > my proposed RFC.
> > 
> > At minimum, one would have to know:
> > 
> > 1) the mail header name (e.g. `X-stable')
> > 2) the search prefix to use (e.g. `xstable:') # can't use dash `-' AFAIK
> > 3) the type of header value (phrase, string, sortable numeric, etc...)
> 
> I'm whole-heartedly for this! This ties nicely to my b4 work where I'd 
> like to be able to identify code-review trailers sent for a specific 
> patch, even if that patch itself is not on lore. For example, this could 
> be a patch that is part of a pull-request on a git forge, but we'd still 
> like to be able to collect and find code-review trailers for it when a 
> maintainer applies it.

OK, a more configurable version is available on a per-inbox basis:

https://public-inbox.org/meta/20240508110957.3108196-...@80x24.org/

But that's a PITA to configure with hundreds of inboxes and
doesn't have extindex support, yet.

I made it share logic with the old altid code; so I'll also be
getting altid into extindex since ISTR users wanting to be able
to lookup gmane stuff via extindex.

And it also works with the new C++ xap_helper process
(which I'll use for threadid: support (still working on that...)).

> I'm perfectly fine with it only being a string, honestly.

Yeah, though there's 3 ways of indexing strings, currently :x
I've decided to keep some options open and support boolean_term,
text, and phrase for now.

boolean_term is the cheapest and probably best for exactly
matching labels/enums and such.  The others may work better
for more complex texts (comma-delimited labels, maybe).

> > So probably just supporting strings and/or phrases to start...
> > 
> > Validation to prevent poisoning by malicious/broken senders can
> > be useful in some cases (and the reason the RFC was a per use
> > case Perl module).  That said, I'm not sure if much validation
> > is necessary for X-stable: headers or if just any text is fine.
> 
> I'd let the consumer clients worry about it.

Agreed.

Reply via email to