You search for a hyphen-aware tokenizer, like this?

https://gist.github.com/jprante/cd120eac542ba6eec965

It is in my plugin bundle

https://github.com/jprante/elasticsearch-plugin-bundle

Jörg

On Wed, Nov 19, 2014 at 5:46 PM, horst knete <[email protected]> wrote:

> Hey guys,
>
> after working with the ELK stack for a while now, we still got an very
> annoying problem regarding the behavior of the standard analyzer - it
> splits terms into tokens using hyphens or dots as delimiters.
>
> e.g logsource:firewall-physical-management get split into "firewall" ,
> "physical" and "management". On one side thats cool because if you search
> for logsource:firewall you get all the events with firewall as an token in
> the field logsource.
>
> The downside on this behaviour is if you are doing e.g. an "top 10 search"
> on an field in Kibana, all the tokens are counted as an whole term and get
> rated due to their count:
> top 10:
> 1. firewall : 10
> 2. physical : 10
> 3. management: 10
>
> instead of top 10:
> 1. firewall-physical-management: 10
>
> Well in the standard mapping from logstash this is solved using and .raw
> field as "not_analyzed" but the downside on this is you got 2 fields
> instead of one (even if its a multi_field) and the usage for kibana users
> is not that great.
>
> So what we need is that logsource:firewall-physical-management get
> tokenized into "firewall-physical-management", "firewall" , "physical" and
> "management".
>
> I tried this using the word_delimiter filter token with the following
> mapping:
>
>  "analysis" : {
>  "analyzer" : {
>                          "my_analyzer" : {
>                                  "type" : "custom",
>                                  "tokenizer" : "whitespace",
>                                  "filter" : ["lowercase", "asciifolding",
> "my_worddelimiter"]
>                                      }
>               },
>  "filter" : {
>         "my_worddelimiter" : {
>                 "type" : "word_delimiter",
>                                 "generate_word_parts": false,
>                                 "generate_number_parts": false,
>                                 "catenate_words": false,
>                                 "catenate_numbers": false,
>                                 "catenate_all": false,
>                                 "split_on_case_change": false,
>                                 "preserve_original": true,
>                                 "split_on_numerics": false,
>                                 "stem_english_possessive": true
>                    }
>               }
>               }
>
> But this unfortunately didnt do the job.
>
> I´ve saw on my recherche that some other guys have an similar problem like
> this, but expect some replacement suggestions, no real solution was found.
>
> If anyone have some ideas on how to start working on this, i would be very
> happy.
>
> thanks.
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/4094292c-057f-43d8-9af0-1ea83ad45a1c%40googlegroups.com
> <https://groups.google.com/d/msgid/elasticsearch/4094292c-057f-43d8-9af0-1ea83ad45a1c%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFpv%3DWwBK_bskq2BELn%2BbTb%3DOwZO%3DOPm5U4Tw%2BrO3tTWg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to