Hi Dilip, thanks for all the information you've provided! This was a bug in Graylog's analyze endpoint which will be fixed in Graylog 2.0.1 (see https://github.com/Graylog2/graylog2-server/pull/2209).
$ curl 'http://localhost:12900/messages/graylog2_73/analyze?string=This%20is%20a%20%24test%3A%5Bto.see.if%20graylog()%20work%24%5D.&pretty=true' { "tokens" : [ "This", "is", "a", "$test:[to.see.if", "graylog()", "work$]." ] } Cheers, Jochen On Monday, 9 May 2016 15:51:12 UTC+2, Dilip Muthukrishnan wrote: > > Hi Jochen, > > localhost:9200/_cat/indices?v reveals that graylog2_3 is the only index > in my Elasticsearch cluster: > > health status index pri rep docs.count docs.deleted store.size > pri.store.size > green open graylog2_3 4 0 180443 0 139.3mb > 139.3mb > > > localhost:9200/_template/ reveals that the graylog-internal template which I > included in my previous message is the only template in the cluster. > > > I should mention that when I try to tokenize the following string in > Elasticsearch with the index as well as the "message" field specified in the > URL, it works as it should, since the message field uses the whitespace > analyzer: > > > curl 'localhost:9200/graylog2_3/_analyze?field=message&pretty=true' -d 'This > is a $test:[to.see.if graylog() work$.' > > "tokens" : [ { > "token" : "This", > "start_offset" : 0, > "end_offset" : 4, > "type" : "word", > "position" : 1 > }, { > "token" : "is", > "start_offset" : 5, > "end_offset" : 7, > "type" : "word", > "position" : 2 > }, { > "token" : "a", > "start_offset" : 8, > "end_offset" : 9, > "type" : "word", > "position" : 3 > }, { > "token" : "$test:[to.see.if", > "start_offset" : 10, > "end_offset" : 26, > "type" : "word", > "position" : 4 > }, { > "token" : "graylog()", > "start_offset" : 27, > "end_offset" : 36, > "type" : "word", > "position" : 5 > }, { > "token" : "work$.", > "start_offset" : 37, > "end_offset" : 43, > "type" : "word", > "position" : 6 > } ] > } > > > This tells me that ES is using the whitespace analyzer correctly. However, > the Graylog API browser is giving me a different result: > > > http://localhost:12900/messages/graylog2_3/analyze?string=This%20is%20a%20%24test%3A%5Bto.see.if%20graylog()%20work%24%5D.&pretty=true > > <http://vtor-lx-tomcat-d01:12900/messages/graylog2_3/analyze?string=This%20is%20a%20%24test%3A%5Bto.see.if%20graylog()%20work%24%5D.&pretty=true> > > { > "tokens" : [ "this", "is", "a", "test", "to.see.if", "graylog", "work" ] > } > > > Is this the result that I should be seeing? Is there anything else that I > can test in order to help me troubleshoot this further? Thanks. > > Sincerely, > > On Monday, May 9, 2016 at 8:49:41 AM UTC-4, Jochen Schalanda wrote: >> >> Hi Dilip, >> >> are there any other conflicting index templates/mappings in your >> Elasticsearch cluster? >> >> Other than that, the index mapping for graylog2_3 is looking fine and ES >> should use the whitespace analyzer for messages indexed into this index. >> >> Cheers, >> Jochen >> >> On Friday, 6 May 2016 22:01:42 UTC+2, Dilip Muthukrishnan wrote: >>> >>> Hi Jochen, >>> >>> I'm still stuck on this one. Any help would be appreciated. Thanks. >>> >>> Sincerely, >>> >>> Dilip M. >>> >>> On Tuesday, May 3, 2016 at 9:32:37 AM UTC-4, Dilip Muthukrishnan wrote: >>>> >>>> Hi Jochen, >>>> >>>> Here's what my "graylog-internal" template currently looks like (as >>>> seen via the Elasticsearch API): >>>> >>>> { >>>> "graylog-internal" : { >>>> "order" : 0, >>>> "template" : "graylog2_*", >>>> "settings" : { }, >>>> "mappings" : { >>>> "message" : { >>>> "_source" : { >>>> "compress" : true, >>>> "enabled" : true >>>> }, >>>> "dynamic_templates" : [ { >>>> "internal_fields" : { >>>> "mapping" : { >>>> "index" : "not_analyzed", >>>> "doc_values" : true >>>> }, >>>> "match" : "gl2_*" >>>> } >>>> }, { >>>> "store_generic" : { >>>> "mapping" : { >>>> "index" : "not_analyzed" >>>> }, >>>> "match" : "*" >>>> } >>>> } ], >>>> "_ttl" : { >>>> "enabled" : true >>>> }, >>>> "properties" : { >>>> "message" : { >>>> "index" : "analyzed", >>>> "analyzer" : "whitespace", >>>> "type" : "string" >>>> }, >>>> "timestamp" : { >>>> "format" : "yyyy-MM-dd HH:mm:ss.SSS", >>>> "doc_values" : true, >>>> "type" : "date" >>>> }, >>>> "source" : { >>>> "index" : "analyzed", >>>> "analyzer" : "analyzer_keyword", >>>> "type" : "string" >>>> }, >>>> "full_message" : { >>>> "index" : "analyzed", >>>> "analyzer" : "whitespace", >>>> "type" : "string" >>>> } >>>> } >>>> } >>>> }, >>>> "aliases" : { } >>>> } >>>> } >>>> >>>> >>>> Here's what my graylog2_3 index currently looks like (as seen via the >>>> Elasticsearch API): >>>> >>>> { >>>> "graylog2_3" : { >>>> "aliases" : { >>>> "graylog2_deflector" : { } >>>> }, >>>> "mappings" : { >>>> "message" : { >>>> "dynamic_templates" : [ { >>>> "internal_fields" : { >>>> "mapping" : { >>>> "index" : "not_analyzed", >>>> "doc_values" : true >>>> }, >>>> "match" : "gl2_*" >>>> } >>>> }, { >>>> "store_generic" : { >>>> "mapping" : { >>>> "index" : "not_analyzed" >>>> }, >>>> "match" : "*" >>>> } >>>> } ], >>>> "_ttl" : { >>>> "enabled" : true >>>> }, >>>> "_source" : { >>>> "compress" : true >>>> }, >>>> "properties" : { >>>> "full_message" : { >>>> "type" : "string", >>>> "analyzer" : "whitespace" >>>> }, >>>> "gl2_remote_ip" : { >>>> "type" : "string", >>>> "index" : "not_analyzed", >>>> "doc_values" : true >>>> }, >>>> "gl2_remote_port" : { >>>> "type" : "long", >>>> "doc_values" : true >>>> }, >>>> "gl2_source_collector" : { >>>> "type" : "string", >>>> "index" : "not_analyzed", >>>> "doc_values" : true >>>> }, >>>> "gl2_source_collector_input" : { >>>> "type" : "string", >>>> "index" : "not_analyzed", >>>> "doc_values" : true >>>> }, >>>> "gl2_source_input" : { >>>> "type" : "string", >>>> "index" : "not_analyzed", >>>> "doc_values" : true >>>> }, >>>> "gl2_source_node" : { >>>> "type" : "string", >>>> "index" : "not_analyzed", >>>> "doc_values" : true >>>> }, >>>> "level" : { >>>> "type" : "string", >>>> "index" : "not_analyzed" >>>> }, >>>> "message" : { >>>> "type" : "string", >>>> "analyzer" : "whitespace" >>>> }, >>>> "source" : { >>>> "type" : "string", >>>> "analyzer" : "analyzer_keyword" >>>> }, >>>> "source_file" : { >>>> "type" : "string", >>>> "index" : "not_analyzed" >>>> }, >>>> "timestamp" : { >>>> "type" : "date", >>>> "doc_values" : true, >>>> "format" : "yyyy-MM-dd HH:mm:ss.SSS" >>>> }, >>>> "version" : { >>>> "type" : "string", >>>> "index" : "not_analyzed" >>>> } >>>> } >>>> } >>>> }, >>>> "settings" : { >>>> "index" : { >>>> "creation_date" : "1462197971182", >>>> "uuid" : "ylBuS8y3SBKRYMyLuMWApg", >>>> "analysis" : { >>>> "analyzer" : { >>>> "analyzer_keyword" : { >>>> "filter" : "lowercase", >>>> "tokenizer" : "keyword" >>>> } >>>> } >>>> }, >>>> "number_of_replicas" : "0", >>>> "number_of_shards" : "4", >>>> "version" : { >>>> "created" : "1070399" >>>> } >>>> } >>>> }, >>>> "warmers" : { } >>>> } >>>> } >>>> >>>> >>>> After cycling the deflector so that it points to the new index, >>>> graylog2_3, I proceeded to delete my old indices. >>>> >>>> Using the Graylog API browser, I tried to tokenize a random string (This >>>> is a $test:[to.see.if graylog() work$.): >>>> >>>> >>>> http://vtor-lx-tomcat-d01:12900/messages/graylog2_3/analyze?string=This%20is%20a%20%24test%3A%5Bto.see.if%20graylog()%20work%24%5D.&pretty=true >>>> >>>> { >>>> "tokens" : [ "this", "is", "a", "test", "to.see.if", "graylog", "work" ] >>>> } >>>> >>>> >>>> This makes sense because if I attempt to tokenize the same string via >>>> Elasticsearch (using the same index), I get the same result: >>>> >>>> curl 'vtor-lx-tomcat-d01:9200/graylog2_3/_analyze?pretty=true' -d 'This >>>> is a $test:[to.see.if graylog() work$.' >>>> >>>> "tokens" : [ { >>>> "token" : "this", >>>> "start_offset" : 0, >>>> "end_offset" : 4, >>>> "type" : "<ALPHANUM>", >>>> "position" : 1 >>>> }, { >>>> "token" : "is", >>>> "start_offset" : 5, >>>> "end_offset" : 7, >>>> "type" : "<ALPHANUM>", >>>> "position" : 2 >>>> }, { >>>> "token" : "a", >>>> "start_offset" : 8, >>>> "end_offset" : 9, >>>> "type" : "<ALPHANUM>", >>>> "position" : 3 >>>> }, { >>>> "token" : "test", >>>> "start_offset" : 11, >>>> "end_offset" : 15, >>>> "type" : "<ALPHANUM>", >>>> "position" : 4 >>>> }, { >>>> "token" : "to.see.if", >>>> "start_offset" : 17, >>>> "end_offset" : 26, >>>> "type" : "<ALPHANUM>", >>>> "position" : 5 >>>> }, { >>>> "token" : "graylog", >>>> "start_offset" : 27, >>>> "end_offset" : 34, >>>> "type" : "<ALPHANUM>", >>>> "position" : 6 >>>> }, { >>>> "token" : "work", >>>> "start_offset" : 37, >>>> "end_offset" : 41, >>>> "type" : "<ALPHANUM>", >>>> "position" : 7 >>>> } ] >>>> } >>>> >>>> However, without specifying the index in Elasticsearch, I get the >>>> result that I am looking for: >>>> >>>> curl 'vtor-lx-tomcat-d01:9200/_analyze?analyzer=whitespace&pretty=true' >>>> -d 'This is a $test:[to.see.if graylog() work$.' >>>> >>>> "tokens" : [ { >>>> "token" : "This", >>>> "start_offset" : 0, >>>> "end_offset" : 4, >>>> "type" : "word", >>>> "position" : 1 >>>> }, { >>>> "token" : "is", >>>> "start_offset" : 5, >>>> "end_offset" : 7, >>>> "type" : "word", >>>> "position" : 2 >>>> }, { >>>> "token" : "a", >>>> "start_offset" : 8, >>>> "end_offset" : 9, >>>> "type" : "word", >>>> "position" : 3 >>>> }, { >>>> "token" : "$test:[to.see.if", >>>> "start_offset" : 10, >>>> "end_offset" : 26, >>>> "type" : "word", >>>> "position" : 4 >>>> }, { >>>> "token" : "graylog()", >>>> "start_offset" : 27, >>>> "end_offset" : 36, >>>> "type" : "word", >>>> "position" : 5 >>>> }, { >>>> "token" : "work$.", >>>> "start_offset" : 37, >>>> "end_offset" : 43, >>>> "type" : "word", >>>> "position" : 6 >>>> } ] >>>> } >>>> >>>> I feel like I am really close to an answer here. It appears that there >>>> is something wrong with my index mapping/settings. >>>> >>>> Sincerely, >>>> >>>> On Tuesday, May 3, 2016 at 3:51:49 AM UTC-4, Jochen Schalanda wrote: >>>>> >>>>> Hi Dilip, >>>>> >>>>> are you 100% sure that the message is in a new index, that the index >>>>> template/mapping was properly applied (see >>>>> https://www.elastic.co/guide/en/elasticsearch/reference/1.7/indices-get-mapping.html), >>>>> >>>>> and that it is the "message" field you were looking for (and not >>>>> "full_message" or another field)? >>>>> >>>>> Cheers, >>>>> Jochen >>>>> >>>>> On Monday, 2 May 2016 18:57:40 UTC+2, Dilip Muthukrishnan wrote: >>>>>> >>>>>> Hi Jochen, >>>>>> >>>>>> Thanks for your reply. I'm using graylog-1.3.4 (server). I removed >>>>>> and added an updated version of the "graylog-internal" template and then >>>>>> cycled the deflector through the web interface. The new index mapping >>>>>> reflects the changes: >>>>>> >>>>>> "message" : { >>>>>> "type" : "string", >>>>>> "analyzer" : "whitespace" >>>>>> } >>>>>> >>>>>> >>>>>> However, it doesn't appear to be reflected in the search. This >>>>>> message is from the latest index but based on this tokenization, it >>>>>> appears >>>>>> to still be using the old "standard analyzer": >>>>>> >>>>>> 02.05.2016 12:47:33.488 *ERROR* [Shell Script Executor Thread for >>>>>> cpu.sh] com.day.crx.core.CRXSessionImpl session# 144563 opened (103) >>>>>> java.lang.Exception: Stack Trace at >>>>>> com.day.crx.core.CRXSessionImpl$Tracker.open(CRXSessionImpl.java:212) at >>>>>> com.day.crx.core.CRXSessionImpl$Tracker.<init>(CRXSessionImpl.java:205) >>>>>> at >>>>>> com.day.crx.core.CRXSessionImpl.<init>(CRXSessionImpl.java:179) at >>>>>> com.day.crx.core.CRXRepositoryImpl.createSessionInstance(CRXRepositoryImpl.java:911) >>>>>> >>>>>> at >>>>>> org.apache.jackrabbit.core.RepositoryImpl.createSession(RepositoryImpl.java:959) >>>>>> >>>>>> at >>>>>> org.apache.jackrabbit.core.SessionFactory.createAdminSession(SessionFactory.java:42) >>>>>> >>>>>> at >>>>>> com.day.crx.sling.server.impl.SlingRepositoryWrapper.loginAdministrative(SlingRepositoryWrapper.java:76) >>>>>> >>>>>> at >>>>>> com.adobe.granite.monitoring.impl.ShellScriptExecutorImpl.extractScript(ShellScriptExecutorImpl.java:161) >>>>>> >>>>>> at >>>>>> com.adobe.granite.monitoring.impl.ShellScriptExecutorImpl.execute(ShellScriptExecutorImpl.java:114) >>>>>> >>>>>> at >>>>>> com.adobe.granite.monitoring.impl.ScriptMBean.invoke(ScriptMBean.java:99) >>>>>> >>>>>> at >>>>>> com.adobe.granite.monitoring.impl.ScriptMBean.invoke(ScriptMBean.java:158) >>>>>> >>>>>> at >>>>>> com.adobe.granite.monitoring.impl.ScriptConfigImpl$ExecutionThread.run(ScriptConfigImpl.java:208) >>>>>> >>>>>> at java.lang.Thread.run(Thread.java:662) >>>>>> >>>>>> >>>>>> Field terms: 02.05.2016124733.488errorshellscriptexecutorthreadfor >>>>>> cpu.shcom.day.crx.core.crxsessionimplsession144563opened103 >>>>>> java.lang.exceptionstacktraceattracker.opencrxsessionimpl.java212 >>>>>> trackerinit205179 >>>>>> com.day.crx.core.crxrepositoryimpl.createsessioninstance >>>>>> crxrepositoryimpl.java911 >>>>>> org.apache.jackrabbit.core.repositoryimpl.createsession >>>>>> repositoryimpl.java959 >>>>>> org.apache.jackrabbit.core.sessionfactory.createadminsession >>>>>> sessionfactory.java42 >>>>>> com.day.crx.sling.server.impl.slingrepositorywrapper.loginadministrative >>>>>> slingrepositorywrapper.java76 >>>>>> com.adobe.granite.monitoring.impl.shellscriptexecutorimpl.extractscript >>>>>> shellscriptexecutorimpl.java161 >>>>>> com.adobe.granite.monitoring.impl.shellscriptexecutorimpl.execute114 >>>>>> com.adobe.granite.monitoring.impl.scriptmbean.invokescriptmbean.java >>>>>> 99158com.adobe.granite.monitoring.impl.scriptconfigimpl >>>>>> executionthread.runscriptconfigimpl.java208java.lang.thread.run >>>>>> thread.java662 >>>>>> >>>>>> As you can see, it has been stripped of various characters like >>>>>> colons and parentheses. >>>>>> >>>>>> >>>>>> On Monday, May 2, 2016 at 12:36:38 PM UTC-4, Jochen Schalanda wrote: >>>>>>> >>>>>>> Hi Dilip, >>>>>>> >>>>>>> the index mapping of Graylog is applied by the means of an index >>>>>>> template. In Graylog 2.0.0, the index template will automatically be >>>>>>> updated but in older versions you'll have to remove the index template >>>>>>> yourself for it to be recreated by Graylog. >>>>>>> >>>>>>> See >>>>>>> https://www.elastic.co/guide/en/elasticsearch/reference/1.7/indices-templates.html >>>>>>> >>>>>>> for details. >>>>>>> >>>>>>> Cheers, >>>>>>> Jochen >>>>>>> >>>>>>> On Thursday, 28 April 2016 21:42:23 UTC+2, Dilip Muthukrishnan wrote: >>>>>>>> >>>>>>>> I'm trying to change the analyzer from "standard" to "whitespace". >>>>>>>> I've set the following property in my Graylog server configuration: >>>>>>>> >>>>>>>> elasticsearch_analyzer = whitespace >>>>>>>> >>>>>>>> It states that my change will be applied to new indices so I >>>>>>>> manually cycled the deflector so that it is now pointing to graylog2_1 >>>>>>>> (previously graylog2_0). However, the new index still uses the >>>>>>>> "standard" >>>>>>>> analyzer based on the mapping in Elasticsearch: >>>>>>>> >>>>>>>> "message" : { >>>>>>>> "type" : "string", >>>>>>>> "analyzer" : "standard" >>>>>>>> }, >>>>>>>> >>>>>>>> >>>>>>>> How do I change the analyzer? >>>>>>>> >>>>>>>> -- You received this message because you are subscribed to the Google Groups "Graylog Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/graylog2/c72c51fd-b752-4802-80eb-1f33cd284060%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
