You search for a hyphen-aware tokenizer, like this? https://gist.github.com/jprante/cd120eac542ba6eec965
It is in my plugin bundle https://github.com/jprante/elasticsearch-plugin-bundle Jörg On Wed, Nov 19, 2014 at 5:46 PM, horst knete <[email protected]> wrote: > Hey guys, > > after working with the ELK stack for a while now, we still got an very > annoying problem regarding the behavior of the standard analyzer - it > splits terms into tokens using hyphens or dots as delimiters. > > e.g logsource:firewall-physical-management get split into "firewall" , > "physical" and "management". On one side thats cool because if you search > for logsource:firewall you get all the events with firewall as an token in > the field logsource. > > The downside on this behaviour is if you are doing e.g. an "top 10 search" > on an field in Kibana, all the tokens are counted as an whole term and get > rated due to their count: > top 10: > 1. firewall : 10 > 2. physical : 10 > 3. management: 10 > > instead of top 10: > 1. firewall-physical-management: 10 > > Well in the standard mapping from logstash this is solved using and .raw > field as "not_analyzed" but the downside on this is you got 2 fields > instead of one (even if its a multi_field) and the usage for kibana users > is not that great. > > So what we need is that logsource:firewall-physical-management get > tokenized into "firewall-physical-management", "firewall" , "physical" and > "management". > > I tried this using the word_delimiter filter token with the following > mapping: > > "analysis" : { > "analyzer" : { > "my_analyzer" : { > "type" : "custom", > "tokenizer" : "whitespace", > "filter" : ["lowercase", "asciifolding", > "my_worddelimiter"] > } > }, > "filter" : { > "my_worddelimiter" : { > "type" : "word_delimiter", > "generate_word_parts": false, > "generate_number_parts": false, > "catenate_words": false, > "catenate_numbers": false, > "catenate_all": false, > "split_on_case_change": false, > "preserve_original": true, > "split_on_numerics": false, > "stem_english_possessive": true > } > } > } > > But this unfortunately didnt do the job. > > I´ve saw on my recherche that some other guys have an similar problem like > this, but expect some replacement suggestions, no real solution was found. > > If anyone have some ideas on how to start working on this, i would be very > happy. > > thanks. > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/4094292c-057f-43d8-9af0-1ea83ad45a1c%40googlegroups.com > <https://groups.google.com/d/msgid/elasticsearch/4094292c-057f-43d8-9af0-1ea83ad45a1c%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFpv%3DWwBK_bskq2BELn%2BbTb%3DOwZO%3DOPm5U4Tw%2BrO3tTWg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
