>>>>> "DB" == David Bremner <da...@tethera.net> writes:
DB> Todd <t...@electricoding.com> writes: >> Adds the indexing and removes the broken test flag >> --- >> lib/database.cc | 1 + >> lib/index.cc | 10 ++++++++++ >> test/T190-multipart.sh | 4 ---- >> 3 files changed, 11 insertions(+), 4 deletions(-) >> >> diff --git a/lib/database.cc b/lib/database.cc >> index 0d2c417..3974e2e 100644 >> --- a/lib/database.cc >> +++ b/lib/database.cc >> @@ -254,6 +254,7 @@ static prefix_t PROBABILISTIC_PREFIX[]= { >> { "from", "XFROM" }, >> { "to", "XTO" }, >> { "attachment", "XATTACHMENT" }, >> + { "mimetype", "XMIMETYPE"}, >> { "subject", "XSUBJECT"}, >> }; DB> I think the commit message should articulate why we are indexing this as DB> a probabilistic prefix, rather than as a boolean prefix. In particular, DB> this gives people a last chance to complain. DB> The reference I know is http://xapian.org/docs/queryparser.html DB> If I understand correctly (it would be great if you could test this DB> Todd) , with a probabilistic prefix, DB> mimetime:pdf DB> will match DB> application/pdf DB> image/pdf DB> application/x-pdf DB> application/x-ext-pdf DB> but not DB> application/x-bzpdf DB> application/x-gzpdf DB> application/x-xzpdf I just tested, and it does work this way with your examples. I *believe* from reading the docs, that xapian is treating the full MIME-type queries as phrase searches anyway due to the embedded slashes. From http://xapian.org/docs/queryparser.html: A phrase surrounded with double quotes ("") matches documents containing that exact phrase. Hyphenated words are also treated as phrases, as are cases such as filenames and email addresses (e.g. /etc/passwd or presid...@whitehouse.gov). I think that we'll get good behavior from the types of queries that will typically be performed due to this automatic phrasing. DB> On the whole, this is probably more beneficial than bad. The downside DB> of probabilistic prefixes/fields is that they are not "anchored", so DB> there is no easy way to distinguish DB> application/pdf DB> from DB> pdf DB> application/x-pdf DB> I guess in a perfect world this would also be explained in DB> notmuch-search-terms(7), but that's pretty much orthogonal to this DB> series. If separate messages with application/pdf and application/x-pdf are indexed, then: mimetype:application/x-pdf finds only the application/x-pdf mimetype:application/pdf finds only the application/pdf mimetype:pdf finds both of the messages I am fairly sure that this behaviour is a result of the automatic phrasing mentioned above. - Todd DB> d
signature.asc
Description: PGP signature
_______________________________________________ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch