On 6 November 2016 at 14:26, Daniel Gruno <[email protected]> wrote: > On 11/06/2016 03:18 PM, sebb wrote: >> Fields such as message-id are stored as text strings, but they are >> only really intended to be used as ids. They don't contain independent >> text parts. >> >> From what I have understood so far from reading the ES docs, such >> fields should be tagged as >> >> "index": "not_analyzed" >> >> AIUI this reduces the analysis overhead and storage requirements, and >> also makes it harder to find fields with >> This probably applies to other fields in "mbox": >> >> mid >> possibly in-reply-to >> also references >> >> And of course the auto-created fields such as attachments >> >> Likewise the doc types currently missing from setup.py: >> >> notifications >> account >> mailinglists >> >> These are internal use only so are not intended for searching. >> >> Or have I got this completely wrong? >> > > message-id is set to not be analyzed, by the setup script (it's in the > mappings it sends to ES when creating the index).
Yes, I know, that was why I mentioned it, but my email was not at all clear. > mid and in-reply-to > should probably also be not analyzed, although mid is really a copy of > the doc ID, IIRC. > the list ID is also not analyzed by default (as > list_raw), neither is the raw from address Yes, I noticed those raw fields. However I'm not sure why one would want to analyse the LID, so why is there a list field as well as list_raw? Since 'from' may contain free text as well as the email address it makes sense to analyse it; I'm not sure why one needs from_raw as well, unless one needs to match against the whole field.
