On Sun, Nov 6, 2016 at 8:22 PM sebb <[email protected]> wrote:

> On 6 November 2016 at 14:37, John D. Ament <[email protected]> wrote:
> > On Sun, Nov 6, 2016 at 9:27 AM Daniel Gruno <[email protected]>
> wrote:
> >
> >> On 11/06/2016 03:18 PM, sebb wrote:
> >> > Fields such as message-id are stored as text strings, but they are
> >> > only really intended to be used as ids. They don't contain independent
> >> > text parts.
> >> >
> >> > From what I have understood so far from reading the ES docs, such
> >> > fields should be tagged as
> >> >
> >> > "index": "not_analyzed"
> >> >
> >> > AIUI this reduces the analysis overhead and storage requirements, and
> >> > also makes it harder to find fields with
> >> > This probably applies to other fields in "mbox":
> >> >
> >> > mid
> >> > possibly in-reply-to
> >> > also references
> >> >
> >> > And of course the auto-created fields such as attachments
> >> >
> >> > Likewise the doc types currently missing from setup.py:
> >> >
> >> > notifications
> >> > account
> >> > mailinglists
> >> >
> >> > These are internal use only so are not intended for searching.
> >> >
> >> > Or have I got this completely wrong?
> >> >
> >>
> >> message-id is set to not be analyzed, by the setup script (it's in the
> >> mappings it sends to ES when creating the index). mid and in-reply-to
> >> should probably also be not analyzed, although mid is really a copy of
> >> the doc ID, IIRC. the list ID is also not analyzed by default (as
> >> list_raw), neither is the raw from address
> >>
> >
> > So I notice the query process is an arbitrary full text query, which runs
> > against _all.
> >
> https://github.com/apache/incubator-ponymail/blob/master/site/api/lib/elastic.lua#L44
>
> Huh?
>
> The query starts:
>
> local url = config.es_url .. doc .. "/_search?q="..query
>
> where
>
> es_url = "http://localhost:9200/ponymail/";
>
> and
>
> doc = "mbox" by default.
>
> Where does the _all come in?
>

When you do a query string query in elastic search (reference:
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html)
the default field unless specified is "_all".  I can't find anything in the
pony code that changes this field.  As a result, its going to search _all
by default.


>
> > unless
> > I need to dig into it a bit further to see if there's something building
> up
> > query a bit different.
> >
> > So... that means most of these mappings are moot.
>

Reply via email to