On 11/06/2016 03:18 PM, sebb wrote: > Fields such as message-id are stored as text strings, but they are > only really intended to be used as ids. They don't contain independent > text parts. > > From what I have understood so far from reading the ES docs, such > fields should be tagged as > > "index": "not_analyzed" > > AIUI this reduces the analysis overhead and storage requirements, and > also makes it harder to find fields with > This probably applies to other fields in "mbox": > > mid > possibly in-reply-to > also references > > And of course the auto-created fields such as attachments > > Likewise the doc types currently missing from setup.py: > > notifications > account > mailinglists > > These are internal use only so are not intended for searching. > > Or have I got this completely wrong? >
message-id is set to not be analyzed, by the setup script (it's in the mappings it sends to ES when creating the index). mid and in-reply-to should probably also be not analyzed, although mid is really a copy of the doc ID, IIRC. the list ID is also not analyzed by default (as list_raw), neither is the raw from address
