Hi Ivan, I have resolved the problem. It works fine now. The template was wrong. The simpleQueryString works fine now too.
Cheers, Marc On Tuesday, September 2, 2014 7:54:27 PM UTC+2, Ivan Brusic wrote: > > Hard to say without looking at your query, but perhaps you are > experiencing query parser issues. The query string query uses the standard > query parser, which might does not tokenize terms in the way your custom > tokenizer might. Try using match queries, which does not use the query > parser to see if it "fixes" the problem. Of course, you will have have the > query syntax at your disposal, but you can find workarounds. > > -- > Ivan > > > On Mon, Sep 1, 2014 at 4:17 AM, Marc <mn.o...@googlemail.com <javascript:> > > wrote: > > Hi Ivan, > > Using a test index and the analyze API, I was no able to create a config, > which is fine for me... theoretically. > { > "template": "logstash-*", > "settings": { > "analysis": { > "filter": { > > "my_word_delimiter": { > > "type": "word_delimiter", > "preserve_original": "true" > } > }, > > "analyzer": { > > "my_analyzer": { > "type": "custom", > "tokenizer": "standard", > "filter": ["standard", > "lowercase", > "stop", > "my_word_delimiter", > "asciifolding"] > } > } > } > }, > "mappings": { > "_default_": { > "properties": { > > "excp": { > > "type": "string", > "index": "analyzed", > "analyzer": "my_analyzer" > }, > "msg": { > > "type": "string", > "index": "analyzed", > > "analyzer": "my_analyzer" > } > } > } > } > } > The problem now is, as soon as I activate this for the two fields and have > a new logstash index created I cannot use a simpleQueryString query to > retrieve any results. > It won't find anything via the REST api. Using the standard logstash > template and mapping it works fine. > Have you observed anything simililar? > > Thx > Marc > > On Friday, August 29, 2014 6:49:41 PM UTC+2, Ivan Brusic wrote: > > That output does not look like the something generated from the standard > analyzer since it contains uppercase letters and various non-word > characters such as '='. > > Your two analysis requests will differ since the second one contains the > default word_delimiter filter instead of your custom my_word_delimiter. > What you are trying to achieve is somewhat difficult, but you can get there > if you keep on tweaking. :) Try using a pattern tokenizer instead of the > whitespace tokenizer if you want more control over word boundaries. > > -- > Ivan > > > On Fri, Aug 29, 2014 at 1:48 AM, Marc <mn.o...@googlemail.com> wrote: > > Hi Ivan, > > thanks again. I have tried so and found a reasonable combination. > Nevertheless, when I now try to use the analyze api with an index that has > the said analyzer defined via template it doesn't seem to apply: > > This is the complete template: > { > "template": "bogstash-*", > "settings": { > "index.number_of_replicas": 0, > "analysis": { > "analyzer": { > "msg_excp_analyzer": { > "type": "custom", > "tokenizer": "whitespace", > "filters": ["word_delimiter", > "lowercase", > "asciifolding", > "shingle", > "standard"] > } > }, > "filters": { > "my_word_delimiter": { > "type": "word_delimiter", > "preserve_original": "true" > }, > "my_asciifolding": { > "type": "asciifolding", > "preserve_original": true > } > } > } > }, > "mappings": { > "_default_": { > "properties": { > "@excp": { > "type": "string", > "index": "analyzed", > "analyzer": "msg_excp_analyzer" > }, > "@msg": { > "type": "string", > "index": "analyzed", > "analyzer": "msg_excp_analyzer" > } > } > } > } > } > I create the index bogstash-1. > Now I test the following: > curl -XGET 'localhost:9200/bogstash-1/_analyze?analyzer=msg_excp_ > analyzer&pretty=1' -d 'Service=MyMDB.onMessage appId=cs > Times=Me:22/Total:22 (updated attributes=gps_lng: 183731222/ gps_lat: > 289309222/ )' > and it returns: > { > "tokens" : [ { > "token" : "Service=MyMDB.onMessage", > "start_offset" : 0, > "end_offset" : 23, > "type" : "word", > "position" : 1 > }, { > "token" : "appId=cs", > "start_offset" : 24, > "end_offset" : 32, > "type" : "word", > "position" : 2 > }, { > "token" : "Times=Me:22/Total:22", > "start_offset" : 33, > "end_offset" : 53, > "type" : "word", > "position" : 3 > }, { > "token" : "(updated", > "start_offset" : 54, > "end_offset" : 62, > "type" : "word", > "position" : 4 > }, { > "token" : "attributes=gps_lng:", > "start_offset" : 63, > "end_offset" : 82, > "type" : "word", > "position" : 5 > }, { > "token" : "183731222/", > "start_offset" : 83, > "end_offset" : 93, > "type" : "word", > "position" : 6 > }, { > "token" : "gps_lat:", > "start_offset" : 94, > "end_offset" : 102, > "type" : "word", > "position" : 7 > }, { > "token" : "289309222/", > "start_offset" : 103, > "end_offset" : 113, > "type" : "word", > "position" : 8 > }, { > "token" : ")", > "start_offset" : 114, > "end_offset" : 115, > "type" : "word", > "position" : 9 > } ] > } > Which is the output of a standard analyzer. > Giving the tokenizer and filters in the analyze API directly works fine: > curl -XGET 'localhost:9200/_analyze?tokenizer=whitespace&filters= > lowercase,word_delimiter,shingle,asciifolding,standard&pretty=1' -d > 'Service=MyMDB.onMessage > appId=cs Times=Me:22/Total:22 (updated attributes=gps_lng: 183731222/ > gps_lat: 289309222/ )' > This results in: > { > "tokens" : [ { > "token" : "service", > "start_offset" : 0, > "end_offset" : 7, > "type" : "word", > "position" : 1 > }, { > "token" : "service mymdb", > "start_offset" : 0, > "end_offset" : 13, > "type" : "shingle", > "position" : 1 > }, { > "token" : "mymdb", > "start_offset" : 8, > "end_offset" : 13, > "type" : "word", > "position" : 2 > }, { > "token" : "mymdb onmessage", > "start_offset" : 8, > "end_offset" : 23, > "type" : "shingle", > "position" : 2 > }, { > "token" : "onmessage", > "start_offset" : <span style="colo > > ... -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8b242461-9899-4de7-8ad3-da6645be9947%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.