On Mon, Nov 7, 2016 at 9:23 AM sebb <[email protected]> wrote:

> On 7 November 2016 at 01:36, John D. Ament <[email protected]> wrote:
> > On Sun, Nov 6, 2016 at 8:22 PM sebb <[email protected]> wrote:
> >
> >> On 6 November 2016 at 14:37, John D. Ament <[email protected]>
> wrote:
> >> > On Sun, Nov 6, 2016 at 9:27 AM Daniel Gruno <[email protected]>
> >> wrote:
> >> >
> >> >> On 11/06/2016 03:18 PM, sebb wrote:
> >> >> > Fields such as message-id are stored as text strings, but they are
> >> >> > only really intended to be used as ids. They don't contain
> independent
> >> >> > text parts.
> >> >> >
> >> >> > From what I have understood so far from reading the ES docs, such
> >> >> > fields should be tagged as
> >> >> >
> >> >> > "index": "not_analyzed"
> >> >> >
> >> >> > AIUI this reduces the analysis overhead and storage requirements,
> and
> >> >> > also makes it harder to find fields with
> >> >> > This probably applies to other fields in "mbox":
> >> >> >
> >> >> > mid
> >> >> > possibly in-reply-to
> >> >> > also references
> >> >> >
> >> >> > And of course the auto-created fields such as attachments
> >> >> >
> >> >> > Likewise the doc types currently missing from setup.py:
> >> >> >
> >> >> > notifications
> >> >> > account
> >> >> > mailinglists
> >> >> >
> >> >> > These are internal use only so are not intended for searching.
> >> >> >
> >> >> > Or have I got this completely wrong?
> >> >> >
> >> >>
> >> >> message-id is set to not be analyzed, by the setup script (it's in
> the
> >> >> mappings it sends to ES when creating the index). mid and in-reply-to
> >> >> should probably also be not analyzed, although mid is really a copy
> of
> >> >> the doc ID, IIRC. the list ID is also not analyzed by default (as
> >> >> list_raw), neither is the raw from address
> >> >>
> >> >
> >> > So I notice the query process is an arbitrary full text query, which
> runs
> >> > against _all.
> >> >
> >>
> https://github.com/apache/incubator-ponymail/blob/master/site/api/lib/elastic.lua#L44
> >>
> >> Huh?
> >>
> >> The query starts:
> >>
> >> local url = config.es_url .. doc .. "/_search?q="..query
> >>
> >> where
> >>
> >> es_url = "http://localhost:9200/ponymail/";
> >>
> >> and
> >>
> >> doc = "mbox" by default.
> >>
> >> Where does the _all come in?
> >>
> >
> > When you do a query string query in elastic search (reference:
> >
> https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html
> )
> > the default field unless specified is "_all".  I can't find anything in
> the
> > pony code that changes this field.  As a result, its going to search _all
> > by default.
>
> stats.lua changes the generic query into:
>
> "query_string": {
>   "default_field": "subject",
>   "query": "(from:\"QUERY\") OR (subject:\"QUERY\") OR (body:\"QUERY\")"
> }
>
> Which does not use the _all field AFAICT
>

Ok, this is what I was looking for ( but couldn't find ).  But to reiterate
my notes from above - this means that the only mappings that matter are
these fields.  Other field mappings don't matter.


>
> >
> >>
> >> > unless
> >> > I need to dig into it a bit further to see if there's something
> building
> >> up
> >> > query a bit different.
> >> >
> >> > So... that means most of these mappings are moot.
> >>
>

Reply via email to