On 6 November 2016 at 14:26, Daniel Gruno <[email protected]> wrote:
> On 11/06/2016 03:18 PM, sebb wrote:
>> Fields such as message-id are stored as text strings, but they are
>> only really intended to be used as ids. They don't contain independent
>> text parts.
>>
>> From what I have understood so far from reading the ES docs, such
>> fields should be tagged as
>>
>> "index": "not_analyzed"
>>
>> AIUI this reduces the analysis overhead and storage requirements, and
>> also makes it harder to find fields with
>> This probably applies to other fields in "mbox":
>>
>> mid
>> possibly in-reply-to
>> also references
>>
>> And of course the auto-created fields such as attachments
>>
>> Likewise the doc types currently missing from setup.py:
>>
>> notifications
>> account
>> mailinglists
>>
>> These are internal use only so are not intended for searching.
>>
>> Or have I got this completely wrong?
>>
>
> message-id is set to not be analyzed, by the setup script (it's in the
> mappings it sends to ES when creating the index).

Yes, I know, that was why I mentioned it, but my email was not at all clear.

> mid and in-reply-to
> should probably also be not analyzed, although mid is really a copy of
> the doc ID, IIRC.

> the list ID is also not analyzed by default (as
> list_raw), neither is the raw from address

Yes, I noticed those raw fields.
However I'm not sure why one would want to analyse the LID, so why is
there a list field as well as list_raw?

Since 'from' may contain free text as well as the email address it
makes sense to analyse it; I'm not sure why one needs from_raw as
well, unless one needs to match against the whole field.

Reply via email to