Hey guys,
after working with the ELK stack for a while now, we still got an very
annoying problem regarding the behavior of the standard analyzer - it
splits terms into tokens using hyphens or dots as delimiters.
e.g logsource:firewall-physical-management get split into "firewall" ,
"physical" and "management". On one side thats cool because if you search
for logsource:firewall you get all the events with firewall as an token in
the field logsource.
The downside on this behaviour is if you are doing e.g. an "top 10 search"
on an field in Kibana, all the tokens are counted as an whole term and get
rated due to their count:
top 10:
1. firewall : 10
2. physical : 10
3. management: 10
instead of top 10:
1. firewall-physical-management: 10
Well in the standard mapping from logstash this is solved using and .raw
field as "not_analyzed" but the downside on this is you got 2 fields
instead of one (even if its a multi_field) and the usage for kibana users
is not that great.
So what we need is that logsource:firewall-physical-management get
tokenized into "firewall-physical-management", "firewall" , "physical" and
"management".
I tried this using the word_delimiter filter token with the following
mapping:
"analysis" : {
"analyzer" : {
"my_analyzer" : {
"type" : "custom",
"tokenizer" : "whitespace",
"filter" : ["lowercase", "asciifolding",
"my_worddelimiter"]
}
},
"filter" : {
"my_worddelimiter" : {
"type" : "word_delimiter",
"generate_word_parts": false,
"generate_number_parts": false,
"catenate_words": false,
"catenate_numbers": false,
"catenate_all": false,
"split_on_case_change": false,
"preserve_original": true,
"split_on_numerics": false,
"stem_english_possessive": true
}
}
}
But this unfortunately didnt do the job.
I´ve saw on my recherche that some other guys have an similar problem like
this, but expect some replacement suggestions, no real solution was found.
If anyone have some ideas on how to start working on this, i would be very
happy.
thanks.
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/4094292c-057f-43d8-9af0-1ea83ad45a1c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.