Re: EL setup for fulltext search

Marc Wed, 03 Sep 2014 02:34:32 -0700

Hi Ivan,

I have resolved the problem. It works fine now. The template was wrong. The 
simpleQueryString works fine now too.



Cheers,
Marc

On Tuesday, September 2, 2014 7:54:27 PM UTC+2, Ivan Brusic wrote:
>
> Hard to say without looking at your query, but perhaps you are 
> experiencing query parser issues. The query string query uses the standard 
> query parser, which might does not tokenize terms in the way your custom 
> tokenizer might. Try using match queries, which does not use the query 
> parser to see if it "fixes" the problem. Of course, you will have have the 
> query syntax at your disposal, but you can find workarounds.
>
> -- 
> Ivan
>
>
> On Mon, Sep 1, 2014 at 4:17 AM, Marc <mn.o...@googlemail.com <javascript:>
> > wrote:
>
> Hi Ivan,
>
> Using a test index and the analyze API, I was no able to create a config, 
> which is fine for me... theoretically.
> {
>     "template": "logstash-*",
>     "settings": {
>         "analysis": {
>             "filter": {
>
>                 "my_word_delimiter": {
>
>                     "type": "word_delimiter",
>                     "preserve_original": "true"
>                 }
>             },
>
>             "analyzer": {
>
>                 "my_analyzer": {
>                     "type": "custom",
>                     "tokenizer": "standard",
>                     "filter": ["standard",
>                     "lowercase",
>                     "stop",
>                     "my_word_delimiter",
>                     "asciifolding"]
>                 }
>             }
>         }
>     },
>     "mappings": {
>         "_default_": {
>             "properties": {
>
>                 "excp": {
>
>                     "type": "string",
>                     "index": "analyzed",
>                     "analyzer": "my_analyzer"
>                 },
>                 "msg": {
>
>                     "type": "string",
>                     "index": "analyzed",
>
>                     "analyzer": "my_analyzer"
>                 }
>             }
>         }
>     }
> }
> The problem now is, as soon as I activate this for the two fields and have 
> a new logstash index created I cannot use a simpleQueryString query to 
> retrieve any results.
> It won't find anything via the REST api. Using the standard logstash 
> template and mapping it works fine.
> Have you observed anything simililar?
>
> Thx
> Marc
>
> On Friday, August 29, 2014 6:49:41 PM UTC+2, Ivan Brusic wrote:
>
> That output does not look like the something generated from the standard 
> analyzer since it contains uppercase letters and various non-word 
> characters such as '='.
>
> Your two analysis requests will differ since the second one contains the 
> default word_delimiter filter instead of your custom my_word_delimiter. 
> What you are trying to achieve is somewhat difficult, but you can get there 
> if you keep on tweaking. :) Try using a pattern tokenizer instead of the 
> whitespace tokenizer if you want more control over word boundaries.
>
> -- 
> Ivan
>
>
> On Fri, Aug 29, 2014 at 1:48 AM, Marc <mn.o...@googlemail.com> wrote:
>
> Hi Ivan,
>
> thanks again. I have tried so and found a reasonable combination.
> Nevertheless, when I now try to use the analyze api with an index that has 
> the said analyzer defined via template it doesn't seem to apply:
>
> This is the complete template:
> {
>     "template": "bogstash-*",
>     "settings": {
>         "index.number_of_replicas": 0,
>         "analysis": {
>             "analyzer": {
>                 "msg_excp_analyzer": {
>                     "type": "custom",
>                     "tokenizer": "whitespace",
>                     "filters": ["word_delimiter",
>                     "lowercase",
>                     "asciifolding",
>                     "shingle",
>                     "standard"]
>                 }
>             },
>             "filters": {
>                 "my_word_delimiter": {
>                     "type": "word_delimiter",
>                     "preserve_original": "true"
>                 },
>                 "my_asciifolding": {
>                     "type": "asciifolding",
>                     "preserve_original": true
>                 }
>             }
>         }
>     },
>     "mappings": {
>         "_default_": {
>             "properties": {
>                 "@excp": {
>                     "type": "string",
>                     "index": "analyzed",
>                     "analyzer": "msg_excp_analyzer"
>                 },
>                 "@msg": {
>                     "type": "string",
>                     "index": "analyzed",
>                     "analyzer": "msg_excp_analyzer"
>                 }
>             }
>         }
>     }
> }
> I create the index bogstash-1.
> Now I test the following:
> curl -XGET 'localhost:9200/bogstash-1/_analyze?analyzer=msg_excp_
> analyzer&pretty=1' -d 'Service=MyMDB.onMessage appId=cs 
> Times=Me:22/Total:22 (updated attributes=gps_lng: 183731222/ gps_lat: 
> 289309222/ )'
> and it returns:
> {
>   "tokens" : [ {
>     "token" : "Service=MyMDB.onMessage",
>     "start_offset" : 0,
>     "end_offset" : 23,
>     "type" : "word",
>     "position" : 1
>   }, {
>     "token" : "appId=cs",
>     "start_offset" : 24,
>     "end_offset" : 32,
>     "type" : "word",
>     "position" : 2
>   }, {
>     "token" : "Times=Me:22/Total:22",
>     "start_offset" : 33,
>     "end_offset" : 53,
>     "type" : "word",
>     "position" : 3
>   }, {
>     "token" : "(updated",
>     "start_offset" : 54,
>     "end_offset" : 62,
>     "type" : "word",
>     "position" : 4
>   }, {
>     "token" : "attributes=gps_lng:",
>     "start_offset" : 63,
>     "end_offset" : 82,
>     "type" : "word",
>     "position" : 5
>   }, {
>     "token" : "183731222/",
>     "start_offset" : 83,
>     "end_offset" : 93,
>     "type" : "word",
>     "position" : 6
>   }, {
>     "token" : "gps_lat:",
>     "start_offset" : 94,
>     "end_offset" : 102,
>     "type" : "word",
>     "position" : 7
>   }, {
>     "token" : "289309222/",
>     "start_offset" : 103,
>     "end_offset" : 113,
>     "type" : "word",
>     "position" : 8
>   }, {
>     "token" : ")",
>     "start_offset" : 114,
>     "end_offset" : 115,
>     "type" : "word",
>     "position" : 9
>   } ]
> }
> Which is the output of a standard analyzer.
> Giving the tokenizer and filters in the analyze API directly works fine:
> curl -XGET 'localhost:9200/_analyze?tokenizer=whitespace&filters=
> lowercase,word_delimiter,shingle,asciifolding,standard&pretty=1' -d 
> 'Service=MyMDB.onMessage 
> appId=cs Times=Me:22/Total:22 (updated attributes=gps_lng: 183731222/ 
> gps_lat: 289309222/ )'
> This results in:
> {
>   "tokens" : [ {
>     "token" : "service",
>     "start_offset" : 0,
>     "end_offset" : 7,
>     "type" : "word",
>     "position" : 1
>   }, {
>     "token" : "service mymdb",
>     "start_offset" : 0,
>     "end_offset" : 13,
>     "type" : "shingle",
>     "position" : 1
>   }, {
>     "token" : "mymdb",
>     "start_offset" : 8,
>     "end_offset" : 13,
>     "type" : "word",
>     "position" : 2
>   }, {
>     "token" : "mymdb onmessage",
>     "start_offset" : 8,
>     "end_offset" : 23,
>     "type" : "shingle",
>     "position" : 2
>   }, {
>     "token" : "onmessage",
>     "start_offset" : <span style="colo
>
> ...

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8b242461-9899-4de7-8ad3-da6645be9947%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: EL setup for fulltext search

Reply via email to