On Mon, 14 Nov 2016, David Bremner <da...@tethera.net> wrote: > the idea is that you can run > > % notmuch search re:subject:<your-favourite-regexp> > % notmuch search re:from:<your-favourite-regexp>' > > or > > % notmuch search subject:"your usual phrase search" > % notmuch search from:"usual phrase search" > > This should also work with bindings, since it extends the query parser. > > This is trivial to extend for other value slots, but currently the only > value slots are date, message_id, from, subject, and last_mod. Date is > already searchable, and message_id is not obviously useful to regex > match. > > This was originally written by Austin Clements, and ported to Xapian > field processors (from Austin's custom query parser) by yours truly.
I can't say I would have done a detailed review of all the Xapian bits and pieces here, but I didn't spot anything obviously wrong either. I suppose I'd prefer the documentation to be more explicit about "re:subject:" and "re:from:" instead of having the generic "re:<field>:" that I think is bound to confuse people. The _ suffixes instead of prefixes in variables seemed a bit odd, but no strong opinions on it. I played around with this a bit, and it seemed to work. Unsurprisingly, getting the quoting right was the hardest part. Even though I know how the stuff works under the hood, it took me a while to realize that you have to use 're:"subject:<regex with spaces>"' to make it work. (I kept trying 're:subject:"<regex with spaces>"'.) I don't know if there's anything we could really do about this. BR, Jani. > --- > > rebase of id:1467034387-16885-1-git-send-email-da...@tethera.net against > master > > doc/man7/notmuch-search-terms.rst | 17 +++++- > lib/Makefile.local | 1 + > lib/database-private.h | 1 + > lib/database.cc | 5 ++ > lib/regexp-fields.cc | 125 > ++++++++++++++++++++++++++++++++++++++ > lib/regexp-fields.h | 77 +++++++++++++++++++++++ > test/T630-regexp-query.sh | 91 +++++++++++++++++++++++++++ > 7 files changed, 316 insertions(+), 1 deletion(-) > create mode 100644 lib/regexp-fields.cc > create mode 100644 lib/regexp-fields.h > create mode 100755 test/T630-regexp-query.sh > > diff --git a/doc/man7/notmuch-search-terms.rst > b/doc/man7/notmuch-search-terms.rst > index de93d73..4c7afc2 100644 > --- a/doc/man7/notmuch-search-terms.rst > +++ b/doc/man7/notmuch-search-terms.rst > @@ -60,6 +60,8 @@ indicate user-supplied values): > > - property:<key>=<value> > > +- re:{subject,from}:<regex> > + > The **from:** prefix is used to match the name or address of the sender > of an email message. > > @@ -146,6 +148,12 @@ The **property:** prefix searches for messages with a > particular > (and extensions) to add metadata to messages. A given key can be > present on a given message with several different values. > > +The **re:<field>:** prefix can be used to restrict the results to > +those whose <field> matches the given regular expression (see > +**regex(7)**). Regular expression searches are only available if > +notmuch is built with **Xapian Field Processors** (see below), and > +currently only for the Subject and From fields. > + > Operators > --------- > > @@ -220,13 +228,19 @@ Boolean and Probabilistic Prefixes > ---------------------------------- > > Xapian (and hence notmuch) prefixes are either **boolean**, supporting > -exact matches like "tag:inbox" or **probabilistic**, supporting a more > flexible **term** based searching. The prefixes currently supported by > notmuch are as follows. > +exact matches like "tag:inbox" or **probabilistic**, supporting a more > +flexible **term** based searching. Certain **special** prefixes are > +processed by notmuch in a way not stricly fitting either of Xapian's > +built in styles. The prefixes currently supported by notmuch are as > +follows. > > > Boolean > **tag:**, **id:**, **thread:**, **folder:**, **path:**, **property:** > Probabilistic > **from:**, **to:**, **subject:**, **attachment:**, **mimetype:** > +Special > + **query:**, **re:<field>** > > Terms and phrases > ----------------- > @@ -396,6 +410,7 @@ Currently the following features require field processor > support: > > - non-range date queries, e.g. "date:today" > - named queries e.g. "query:my_special_query" > +- regular expression searches, e.g. "re:subject:^\\[SPAM\\]" > > SEE ALSO > ======== > diff --git a/lib/Makefile.local b/lib/Makefile.local > index 3d1030a..ccd32ab 100644 > --- a/lib/Makefile.local > +++ b/lib/Makefile.local > @@ -53,6 +53,7 @@ libnotmuch_cxx_srcs = \ > $(dir)/query.cc \ > $(dir)/query-fp.cc \ > $(dir)/config.cc \ > + $(dir)/regexp-fields.cc \ > $(dir)/thread.cc > > libnotmuch_modules := $(libnotmuch_c_srcs:.c=.o) > $(libnotmuch_cxx_srcs:.cc=.o) > diff --git a/lib/database-private.h b/lib/database-private.h > index ca71a92..900a989 100644 > --- a/lib/database-private.h > +++ b/lib/database-private.h > @@ -186,6 +186,7 @@ struct _notmuch_database { > #if HAVE_XAPIAN_FIELD_PROCESSOR > Xapian::FieldProcessor *date_field_processor; > Xapian::FieldProcessor *query_field_processor; > + Xapian::FieldProcessor *re_field_processor; > #endif > Xapian::ValueRangeProcessor *last_mod_range_processor; > }; > diff --git a/lib/database.cc b/lib/database.cc > index 2d19f20..851a62d 100644 > --- a/lib/database.cc > +++ b/lib/database.cc > @@ -21,6 +21,7 @@ > #include "database-private.h" > #include "parse-time-vrp.h" > #include "query-fp.h" > +#include "regexp-fields.h" > #include "string-util.h" > > #include <iostream> > @@ -1042,6 +1043,8 @@ notmuch_database_open_verbose (const char *path, > notmuch->query_parser->add_boolean_prefix("date", > notmuch->date_field_processor); > notmuch->query_field_processor = new QueryFieldProcessor > (*notmuch->query_parser, notmuch); > notmuch->query_parser->add_boolean_prefix("query", > notmuch->query_field_processor); > + notmuch->re_field_processor = new RegexpFieldProcessor > (*notmuch->query_parser, notmuch); > + notmuch->query_parser->add_boolean_prefix("re", > notmuch->re_field_processor); > #endif > notmuch->last_mod_range_processor = new > Xapian::NumberValueRangeProcessor (NOTMUCH_VALUE_LAST_MOD, "lastmod:"); > > @@ -1138,6 +1141,8 @@ notmuch_database_close (notmuch_database_t *notmuch) > notmuch->date_field_processor = NULL; > delete notmuch->query_field_processor; > notmuch->query_field_processor = NULL; > + delete notmuch->re_field_processor; > + notmuch->re_field_processor = NULL; > #endif > > return status; > diff --git a/lib/regexp-fields.cc b/lib/regexp-fields.cc > new file mode 100644 > index 0000000..4d3d972 > --- /dev/null > +++ b/lib/regexp-fields.cc > @@ -0,0 +1,125 @@ > +/* regexp-fields.cc - "re:" field processor glue > + * > + * This file is part of notmuch. > + * > + * Copyright © 2015 Austin Clements > + * Copyright © 2016 David Bremner > + * > + * This program is free software: you can redistribute it and/or modify > + * it under the terms of the GNU General Public License as published by > + * the Free Software Foundation, either version 3 of the License, or > + * (at your option) any later version. > + * > + * This program is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + * GNU General Public License for more details. > + * > + * You should have received a copy of the GNU General Public License > + * along with this program. If not, see https://www.gnu.org/licenses/ . > + * > + * Author: Austin Clements <acleme...@csail.mit.edu> > + * David Bremner <da...@tethera.net> > + */ > + > +#include "regexp-fields.h" > +#include "notmuch-private.h" > + > +#if HAVE_XAPIAN_FIELD_PROCESSOR > +RegexpPostingSource::RegexpPostingSource (Xapian::valueno slot, const > std::string ®exp) > + : slot_ (slot) > +{ > + int err = regcomp (®exp_, regexp.c_str (), REG_EXTENDED | REG_NOSUB); > + > + if (err != 0) { > + size_t len = regerror (err, ®exp_, NULL, 0); > + char *buffer = new char[len]; > + std::string msg; > + (void) regerror (err, ®exp_, buffer, len); > + msg.assign (buffer, len); > + delete buffer; > + > + throw Xapian::QueryParserError (msg); > + } > +} > + > +RegexpPostingSource::~RegexpPostingSource () > +{ > + regfree (®exp_); > +} > + > +void > +RegexpPostingSource::init (const Xapian::Database &db) > +{ > + db_ = db; > + it_ = db_.valuestream_begin (slot_); > + end_ = db.valuestream_end (slot_); > + started_ = false; > +} > + > +Xapian::doccount > +RegexpPostingSource::get_termfreq_min () const > +{ > + return 0; > +} > + > +Xapian::doccount > +RegexpPostingSource::get_termfreq_est () const > +{ > + return get_termfreq_max () / 2; > +} > + > +Xapian::doccount > +RegexpPostingSource::get_termfreq_max () const > +{ > + return db_.get_value_freq (slot_); > +} > + > +Xapian::docid > +RegexpPostingSource::get_docid () const > +{ > + return it_.get_docid (); > +} > + > +bool > +RegexpPostingSource::at_end () const > +{ > + return it_ == end_; > +} > + > +void > +RegexpPostingSource::next (unused (double min_wt)) > +{ > + if (started_ && ! at_end ()) > + ++it_; > + started_ = true; > + > + for (; ! at_end (); ++it_) { > + std::string value = *it_; > + if (regexec (®exp_, value.c_str (), 0, NULL, 0) == 0) > + break; > + } > +} > + > +static Xapian::valueno > +_find_slot (std::string prefix) > +{ > + if (prefix == "from") > + return NOTMUCH_VALUE_FROM; > + else if (prefix == "subject") > + return NOTMUCH_VALUE_SUBJECT; > + else > + throw Xapian::QueryParserError ("unsupported regexp field '" + prefix + > "'"); > +} > + > +Xapian::Query > +RegexpFieldProcessor::operator() (const std::string & str) > +{ > + size_t pos = str.find_first_of (':'); > + std::string prefix = str.substr (0, pos); > + std::string regexp = str.substr (pos + 1); > + > + postings = new RegexpPostingSource (_find_slot (prefix), regexp); > + return Xapian::Query (postings); > +} > +#endif > diff --git a/lib/regexp-fields.h b/lib/regexp-fields.h > new file mode 100644 > index 0000000..2c9c2d7 > --- /dev/null > +++ b/lib/regexp-fields.h > @@ -0,0 +1,77 @@ > +/* regex-fields.h - xapian glue for semi-bruteforce regexp search > + * > + * This file is part of notmuch. > + * > + * Copyright © 2015 Austin Clements > + * Copyright © 2016 David Bremner > + * > + * This program is free software: you can redistribute it and/or modify > + * it under the terms of the GNU General Public License as published by > + * the Free Software Foundation, either version 3 of the License, or > + * (at your option) any later version. > + * > + * This program is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + * GNU General Public License for more details. > + * > + * You should have received a copy of the GNU General Public License > + * along with this program. If not, see https://www.gnu.org/licenses/ . > + * > + * Author: Austin Clements <acleme...@csail.mit.edu> > + * David Bremner <da...@tethera.net> > + */ > + > +#ifndef NOTMUCH_REGEXP_FIELDS_H > +#define NOTMUCH_REGEXP_FIELDS_H > +#if HAVE_XAPIAN_FIELD_PROCESSOR > +#include <sys/types.h> > +#include <regex.h> > +#include <xapian.h> > +#include "notmuch-private.h" > + > +/* A posting source that returns documents where a value matches a > + * regexp. > + */ > +class RegexpPostingSource : public Xapian::PostingSource > +{ > + protected: > + const Xapian::valueno slot_; > + regex_t regexp_; > + Xapian::Database db_; > + bool started_; > + Xapian::ValueIterator it_, end_; > + > +/* No copying */ > + RegexpPostingSource (const RegexpPostingSource &); > + RegexpPostingSource &operator= (const RegexpPostingSource &); > + > + public: > + RegexpPostingSource (Xapian::valueno slot, const std::string ®exp); > + ~RegexpPostingSource (); > + void init (const Xapian::Database &db); > + Xapian::doccount get_termfreq_min () const; > + Xapian::doccount get_termfreq_est () const; > + Xapian::doccount get_termfreq_max () const; > + Xapian::docid get_docid () const; > + bool at_end () const; > + void next (unused (double min_wt)); > +}; > + > + > +class RegexpFieldProcessor : public Xapian::FieldProcessor { > + protected: > + Xapian::QueryParser &parser; > + notmuch_database_t *notmuch; > + RegexpPostingSource *postings = NULL; > + > + public: > + RegexpFieldProcessor (Xapian::QueryParser &parser_, notmuch_database_t > *notmuch_) > + : parser(parser_), notmuch(notmuch_) { }; > + > + ~RegexpFieldProcessor () { delete postings; }; > + > + Xapian::Query operator()(const std::string & str); > +}; > +#endif > +#endif /* NOTMUCH_REGEXP_FIELDS_H */ > diff --git a/test/T630-regexp-query.sh b/test/T630-regexp-query.sh > new file mode 100755 > index 0000000..3bbe47c > --- /dev/null > +++ b/test/T630-regexp-query.sh > @@ -0,0 +1,91 @@ > +#!/usr/bin/env bash > +test_description='regular expression searches' > +. ./test-lib.sh || exit 1 > + > +add_email_corpus > + > + > +if [ $NOTMUCH_HAVE_XAPIAN_FIELD_PROCESSOR -eq 1 ]; then > + > + notmuch search --output=messages from:cworth > cworth.msg-ids > + > + test_begin_subtest "regexp from search, case sensitive" > + notmuch search --output=messages re:from:carl > OUTPUT > + test_expect_equal_file /dev/null OUTPUT > + > + test_begin_subtest "empty regexp or query" > + notmuch search --output=messages re:from:carl or from:cworth > OUTPUT > + test_expect_equal_file cworth.msg-ids OUTPUT > + > + test_begin_subtest "non-empty regexp and query" > + notmuch search re:from:cworth and subject:patch > OUTPUT > + cat <<EOF > EXPECTED > +thread:0000000000000008 2009-11-18 [1/2] Carl Worth| Alex Botero-Lowry; > [notmuch] [PATCH] Error out if no query is supplied to search instead of > going into an infinite loop (attachment inbox unread) > +thread:0000000000000007 2009-11-18 [1/2] Carl Worth| Ingmar Vanhassel; > [notmuch] [PATCH] Typsos (inbox unread) > +thread:0000000000000018 2009-11-18 [1/2] Carl Worth| Jan Janak; [notmuch] > [PATCH] Older versions of install do not support -C. (inbox unread) > +thread:0000000000000017 2009-11-18 [1/2] Carl Worth| Keith Packard; > [notmuch] [PATCH] Make notmuch-show 'X' (and 'x') commands remove inbox (and > unread) tags (inbox unread) > +thread:0000000000000014 2009-11-18 [2/5] Carl Worth| Mikhail Gusarov, > Keith Packard; [notmuch] [PATCH 1/2] Close message file after parsing message > headers (inbox unread) > +thread:0000000000000001 2009-11-18 [1/1] Stewart Smith; [notmuch] [PATCH] > Fix linking with gcc to use g++ to link in C++ libs. (inbox unread) > +EOF > + test_expect_equal_file EXPECTED OUTPUT > + > + test_begin_subtest "regexp from search, duplicate term search" > + notmuch search --output=messages re:from:cworth > OUTPUT > + test_expect_equal_file cworth.msg-ids OUTPUT > + > + test_begin_subtest "long enough regexp matches only desired senders" > + notmuch search --output=messages 're:"from:C.* Wo"' > OUTPUT > + test_expect_equal_file cworth.msg-ids OUTPUT > + > + test_begin_subtest "shorter regexp matches one more sender" > + notmuch search --output=messages 're:"from:C.* W"' > OUTPUT > + (echo id:1258544095-16616-1-git-send-email-ch...@chris-wilson.co.uk ; > cat cworth.msg-ids) > EXPECTED > + test_expect_equal_file EXPECTED OUTPUT > + > + test_begin_subtest "regexp subject search, non-ASCII" > + notmuch search --output=messages re:subject:accentué > OUTPUT > + echo id:877h1wv7mg....@inf-8657.int-evry.fr > EXPECTED > + test_expect_equal_file EXPECTED OUTPUT > + > + test_begin_subtest "regexp subject search, punctuation" > + notmuch search re:subject:\'X\' > OUTPUT > + cat <<EOF > EXPECTED > +thread:0000000000000017 2009-11-18 [2/2] Keith Packard, Carl Worth; > [notmuch] [PATCH] Make notmuch-show 'X' (and 'x') commands remove inbox (and > unread) tags (inbox unread) > +EOF > + test_expect_equal_file EXPECTED OUTPUT > + > + test_begin_subtest "regexp subject search, no punctuation" > + notmuch search re:subject:X > OUTPUT > + cat <<EOF > EXPECTED > +thread:0000000000000017 2009-11-18 [2/2] Keith Packard, Carl Worth; > [notmuch] [PATCH] Make notmuch-show 'X' (and 'x') commands remove inbox (and > unread) tags (inbox unread) > +thread:000000000000000f 2009-11-18 [4/4] Jjgod Jiang, Alexander > Botero-Lowry; [notmuch] Mac OS X/Darwin compatibility issues (inbox unread) > +EOF > + test_expect_equal_file EXPECTED OUTPUT > + > + test_begin_subtest "combine regexp from and subject" > + notmuch search re:subject:-C and re:from:.an.k > OUTPUT > + cat <<EOF > EXPECTED > +thread:0000000000000018 2009-11-17 [1/2] Jan Janak| Carl Worth; [notmuch] > [PATCH] Older versions of install do not support -C. (inbox unread) > +EOF > + test_expect_equal_file EXPECTED OUTPUT > + > + test_begin_subtest "bad subprefix" > + notmuch search 're:unsupported:.*' 1>OUTPUT 2>&1 > + cat <<EOF > EXPECTED > +notmuch search: A Xapian exception occurred > +A Xapian exception occurred performing query: unsupported regexp field > 'unsupported' > +Query string was: re:unsupported:.* > +EOF > + test_expect_equal_file EXPECTED OUTPUT > + > + test_begin_subtest "regexp error reporting" > + notmuch search 're:from:unbalanced[' 1>OUTPUT 2>&1 > + cat <<EOF > EXPECTED > +notmuch search: A Xapian exception occurred > +A Xapian exception occurred performing query: Invalid regular expression > +Query string was: re:from:unbalanced[ > +EOF > + test_expect_equal_file EXPECTED OUTPUT > +fi > + > +test_done > -- > 2.10.2 > > _______________________________________________ > notmuch mailing list > notmuch@notmuchmail.org > https://notmuchmail.org/mailman/listinfo/notmuch _______________________________________________ notmuch mailing list notmuch@notmuchmail.org https://notmuchmail.org/mailman/listinfo/notmuch