Remember that Lucene is an inverted index and you can not deduce index
volume from data input volume. If you have lots of millions of unique
values = high cardinality, you will add this to Elasticsearch index volume,
yes. If you have some thousand values = low cardinality in a field, you add
close to nothing to index volume. You can even copy a field to many other
fields and this adds close to nothing, this depends on the analyzer.

There is no alternative to multi field: if you want to search analyzed
forms, they must be in the index. If you want to examine field values from
input for frequency, e.g. in aggregations, they must be in the index
unchanged.

Jörg


On Tue, Oct 7, 2014 at 6:59 AM, Konstantin Erman <[email protected]> wrote:

> Does not it cause substantial inflation in the amount of data to be
> processed and stored at indexing time?
>
> As with most logs aggregation systems indexing is many orders of magnitude
> more frequent operation than querying and I'm concerned that using
> multi_fields instead of all simple string fields may negatively impact
> indexing performance.
>
> May be there is a way to solve that problem at querying time?
>
> ALSO each field gets "primary" name and other names with the dot and the
> suffix for different representations. How to select which representation
> should be used as primary?
>
> On Monday, October 6, 2014 8:50:22 PM UTC-7, Doug Nelson wrote:
> > I use multi fields to have several different analysis types supported as
> need and also to have the raw version available like in your example.
> >
> >
> >
> > On Monday, October 6, 2014 8:34:34 PM UTC-5, Konstantin Erman wrote:
> > I have documents in ES with the field "Message", which normally
> represents some multi word text string. Trying to query it with Kibana to
> see which strings are in this property most frequently. What I actually get
> back is the table which shows frequency of the specific words, but not the
> whole strings!
> >
> >
> > Now that I started to understand something about ES, my guess is that I
> supposed to map that "Message" field as { "type": "string", "index":
> "not_analyzed" }, so it is not split into words. But on the other hand I
> still want to be able to find documents by searching for some words from
> their message fields.
> >
> >
> > Next thought - multi_field "mapping":
> >                         {
> >                             "type" : "string",
> >                             "fields": {
> >                                 "raw":   { "type": "string", "index":
> "not_analyzed" }
> >                             }
> >                         }
> >
> >
> > So that for normal query analysed Message field would work and when I
> build my Terms panel I use Message.raw instead.
> >
> >
> > I need a confirmation that I'm moving in the right direction and this is
> optimal and intended way to achieve the goal. It does not look so elegant,
> that's why I'm asking. May be I miss some other ways to search string field
> using separate words, but still treat it as a whole for the purpose of
> counting. Please advise!
> > Konstantin
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/510ce9e3-fd67-41c3-b969-b25e32eef352%40googlegroups.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGPpc%3DtvRO%3D93XM7X4rCQVkzP%2B_dRhyJCFHWbpFvB9%2BWA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to