Hi Dilip,

thanks for all the information you've provided! This was a bug in Graylog's 
analyze endpoint which will be fixed in Graylog 2.0.1 (see 
https://github.com/Graylog2/graylog2-server/pull/2209).

$ curl 
'http://localhost:12900/messages/graylog2_73/analyze?string=This%20is%20a%20%24test%3A%5Bto.see.if%20graylog()%20work%24%5D.&pretty=true'
{
  "tokens" : [ "This", "is", "a", "$test:[to.see.if", "graylog()", 
"work$]." ]
}



Cheers,
Jochen

On Monday, 9 May 2016 15:51:12 UTC+2, Dilip Muthukrishnan wrote:
>
> Hi Jochen,
>
> localhost:9200/_cat/indices?v reveals that graylog2_3 is the only index 
> in my Elasticsearch cluster:
>
> health status index      pri rep docs.count docs.deleted store.size 
> pri.store.size 
> green  open   graylog2_3   4   0     180443            0    139.3mb        
> 139.3mb 
>
>
> localhost:9200/_template/ reveals that the graylog-internal template which I 
> included in my previous message is the only template in the cluster.
>
>
> I should mention that when I try to tokenize the following string in 
> Elasticsearch with the index as well as the "message" field specified in the 
> URL, it works as it should, since the message field uses the whitespace 
> analyzer:
>
>
> curl 'localhost:9200/graylog2_3/_analyze?field=message&pretty=true' -d 'This 
> is a $test:[to.see.if graylog() work$.'
>
> "tokens" : [ {
>     "token" : "This",
>     "start_offset" : 0,
>     "end_offset" : 4,
>     "type" : "word",
>     "position" : 1
>   }, {
>     "token" : "is",
>     "start_offset" : 5,
>     "end_offset" : 7,
>     "type" : "word",
>     "position" : 2
>   }, {
>     "token" : "a",
>     "start_offset" : 8,
>     "end_offset" : 9,
>     "type" : "word",
>     "position" : 3
>   }, {
>     "token" : "$test:[to.see.if",
>     "start_offset" : 10,
>     "end_offset" : 26,
>     "type" : "word",
>     "position" : 4
>   }, {
>     "token" : "graylog()",
>     "start_offset" : 27,
>     "end_offset" : 36,
>     "type" : "word",
>     "position" : 5
>   }, {
>     "token" : "work$.",
>     "start_offset" : 37,
>     "end_offset" : 43,
>     "type" : "word",
>     "position" : 6
>   } ]
> }
>
>
> This tells me that ES is using the whitespace analyzer correctly.  However, 
> the Graylog API browser is giving me a different result:
>
>
> http://localhost:12900/messages/graylog2_3/analyze?string=This%20is%20a%20%24test%3A%5Bto.see.if%20graylog()%20work%24%5D.&pretty=true
>  
> <http://vtor-lx-tomcat-d01:12900/messages/graylog2_3/analyze?string=This%20is%20a%20%24test%3A%5Bto.see.if%20graylog()%20work%24%5D.&pretty=true>
>
> {
>   "tokens" : [ "this", "is", "a", "test", "to.see.if", "graylog", "work" ]
> }
>
>
> Is this the result that I should be seeing?  Is there anything else that I 
> can test in order to help me troubleshoot this further?  Thanks.
>
> Sincerely,
>
> On Monday, May 9, 2016 at 8:49:41 AM UTC-4, Jochen Schalanda wrote:
>>
>> Hi Dilip,
>>
>> are there any other conflicting index templates/mappings in your 
>> Elasticsearch cluster?
>>
>> Other than that, the index mapping for graylog2_3 is looking fine and ES 
>> should use the whitespace analyzer for messages indexed into this index.
>>
>> Cheers,
>> Jochen
>>
>> On Friday, 6 May 2016 22:01:42 UTC+2, Dilip Muthukrishnan wrote:
>>>
>>> Hi Jochen,
>>>
>>> I'm still stuck on this one.  Any help would be appreciated.  Thanks.
>>>
>>> Sincerely,
>>>
>>> Dilip M.
>>>
>>> On Tuesday, May 3, 2016 at 9:32:37 AM UTC-4, Dilip Muthukrishnan wrote:
>>>>
>>>> Hi Jochen,
>>>>
>>>> Here's what my "graylog-internal" template currently looks like (as 
>>>> seen via the Elasticsearch API):
>>>>
>>>> {
>>>>   "graylog-internal" : {
>>>>     "order" : 0,
>>>>     "template" : "graylog2_*",
>>>>     "settings" : { },
>>>>     "mappings" : {
>>>>       "message" : {
>>>>         "_source" : {
>>>>           "compress" : true,
>>>>           "enabled" : true
>>>>         },
>>>>         "dynamic_templates" : [ {
>>>>           "internal_fields" : {
>>>>             "mapping" : {
>>>>               "index" : "not_analyzed",
>>>>               "doc_values" : true
>>>>             },
>>>>             "match" : "gl2_*"
>>>>           }
>>>>         }, {
>>>>           "store_generic" : {
>>>>             "mapping" : {
>>>>               "index" : "not_analyzed"
>>>>             },
>>>>             "match" : "*"
>>>>           }
>>>>         } ],
>>>>         "_ttl" : {
>>>>           "enabled" : true
>>>>         },
>>>>         "properties" : {
>>>>           "message" : {
>>>>             "index" : "analyzed",
>>>>             "analyzer" : "whitespace",
>>>>             "type" : "string"
>>>>           },
>>>>           "timestamp" : {
>>>>             "format" : "yyyy-MM-dd HH:mm:ss.SSS",
>>>>             "doc_values" : true,
>>>>             "type" : "date"
>>>>           },
>>>>           "source" : {
>>>>             "index" : "analyzed",
>>>>             "analyzer" : "analyzer_keyword",
>>>>             "type" : "string"
>>>>           },
>>>>           "full_message" : {
>>>>             "index" : "analyzed",
>>>>             "analyzer" : "whitespace",
>>>>             "type" : "string"
>>>>           }
>>>>         }
>>>>       }
>>>>     },
>>>>     "aliases" : { }
>>>>   }
>>>> }
>>>>
>>>>
>>>> Here's what my graylog2_3 index currently looks like (as seen via the 
>>>> Elasticsearch API):
>>>>
>>>> {
>>>>   "graylog2_3" : {
>>>>     "aliases" : {
>>>>       "graylog2_deflector" : { }
>>>>     },
>>>>     "mappings" : {
>>>>       "message" : {
>>>>         "dynamic_templates" : [ {
>>>>           "internal_fields" : {
>>>>             "mapping" : {
>>>>               "index" : "not_analyzed",
>>>>               "doc_values" : true
>>>>             },
>>>>             "match" : "gl2_*"
>>>>           }
>>>>         }, {
>>>>           "store_generic" : {
>>>>             "mapping" : {
>>>>               "index" : "not_analyzed"
>>>>             },
>>>>             "match" : "*"
>>>>           }
>>>>         } ],
>>>>         "_ttl" : {
>>>>           "enabled" : true
>>>>         },
>>>>         "_source" : {
>>>>           "compress" : true
>>>>         },
>>>>         "properties" : {
>>>>           "full_message" : {
>>>>             "type" : "string",
>>>>             "analyzer" : "whitespace"
>>>>           },
>>>>           "gl2_remote_ip" : {
>>>>             "type" : "string",
>>>>             "index" : "not_analyzed",
>>>>             "doc_values" : true
>>>>           },
>>>>           "gl2_remote_port" : {
>>>>             "type" : "long",
>>>>             "doc_values" : true
>>>>           },
>>>>           "gl2_source_collector" : {
>>>>             "type" : "string",
>>>>             "index" : "not_analyzed",
>>>>             "doc_values" : true
>>>>           },
>>>>           "gl2_source_collector_input" : {
>>>>             "type" : "string",
>>>>             "index" : "not_analyzed",
>>>>             "doc_values" : true
>>>>           },
>>>>           "gl2_source_input" : {
>>>>             "type" : "string",
>>>>             "index" : "not_analyzed",
>>>>             "doc_values" : true
>>>>           },
>>>>           "gl2_source_node" : {
>>>>             "type" : "string",
>>>>             "index" : "not_analyzed",
>>>>             "doc_values" : true
>>>>           },
>>>>           "level" : {
>>>>             "type" : "string",
>>>>             "index" : "not_analyzed"
>>>>           },
>>>>           "message" : {
>>>>             "type" : "string",
>>>>             "analyzer" : "whitespace"
>>>>           },
>>>>           "source" : {
>>>>             "type" : "string",
>>>>             "analyzer" : "analyzer_keyword"
>>>>           },
>>>>           "source_file" : {
>>>>             "type" : "string",
>>>>             "index" : "not_analyzed"
>>>>           },
>>>>           "timestamp" : {
>>>>             "type" : "date",
>>>>             "doc_values" : true,
>>>>             "format" : "yyyy-MM-dd HH:mm:ss.SSS"
>>>>           },
>>>>           "version" : {
>>>>             "type" : "string",
>>>>             "index" : "not_analyzed"
>>>>           }
>>>>         }
>>>>       }
>>>>     },
>>>>     "settings" : {
>>>>       "index" : {
>>>>         "creation_date" : "1462197971182",
>>>>         "uuid" : "ylBuS8y3SBKRYMyLuMWApg",
>>>>         "analysis" : {
>>>>           "analyzer" : {
>>>>             "analyzer_keyword" : {
>>>>               "filter" : "lowercase",
>>>>               "tokenizer" : "keyword"
>>>>             }
>>>>           }
>>>>         },
>>>>         "number_of_replicas" : "0",
>>>>         "number_of_shards" : "4",
>>>>         "version" : {
>>>>           "created" : "1070399"
>>>>         }
>>>>       }
>>>>     },
>>>>     "warmers" : { }
>>>>   }
>>>> }
>>>>
>>>>
>>>> After cycling the deflector so that it points to the new index, 
>>>> graylog2_3, I proceeded to delete my old indices.
>>>>
>>>> Using the Graylog API browser, I tried to tokenize a random string (This 
>>>> is a $test:[to.see.if graylog() work$.):
>>>>
>>>>
>>>> http://vtor-lx-tomcat-d01:12900/messages/graylog2_3/analyze?string=This%20is%20a%20%24test%3A%5Bto.see.if%20graylog()%20work%24%5D.&pretty=true
>>>>
>>>> {
>>>>   "tokens" : [ "this", "is", "a", "test", "to.see.if", "graylog", "work" ]
>>>> }
>>>>
>>>>
>>>> This makes sense because if I attempt to tokenize the same string via 
>>>> Elasticsearch (using the same index), I get the same result:
>>>>
>>>> curl 'vtor-lx-tomcat-d01:9200/graylog2_3/_analyze?pretty=true' -d 'This 
>>>> is a $test:[to.see.if graylog() work$.'
>>>>
>>>> "tokens" : [ {
>>>>     "token" : "this",
>>>>     "start_offset" : 0,
>>>>     "end_offset" : 4,
>>>>     "type" : "<ALPHANUM>",
>>>>     "position" : 1
>>>>   }, {
>>>>     "token" : "is",
>>>>     "start_offset" : 5,
>>>>     "end_offset" : 7,
>>>>     "type" : "<ALPHANUM>",
>>>>     "position" : 2
>>>>   }, {
>>>>     "token" : "a",
>>>>     "start_offset" : 8,
>>>>     "end_offset" : 9,
>>>>     "type" : "<ALPHANUM>",
>>>>     "position" : 3
>>>>   }, {
>>>>     "token" : "test",
>>>>     "start_offset" : 11,
>>>>     "end_offset" : 15,
>>>>     "type" : "<ALPHANUM>",
>>>>     "position" : 4
>>>>   }, {
>>>>     "token" : "to.see.if",
>>>>     "start_offset" : 17,
>>>>     "end_offset" : 26,
>>>>     "type" : "<ALPHANUM>",
>>>>     "position" : 5
>>>>   }, {
>>>>     "token" : "graylog",
>>>>     "start_offset" : 27,
>>>>     "end_offset" : 34,
>>>>     "type" : "<ALPHANUM>",
>>>>     "position" : 6
>>>>   }, {
>>>>     "token" : "work",
>>>>     "start_offset" : 37,
>>>>     "end_offset" : 41,
>>>>     "type" : "<ALPHANUM>",
>>>>     "position" : 7
>>>>   } ]
>>>> }
>>>>
>>>> However, without specifying the index in Elasticsearch, I get the 
>>>> result that I am looking for:
>>>>
>>>> curl 'vtor-lx-tomcat-d01:9200/_analyze?analyzer=whitespace&pretty=true' 
>>>> -d 'This is a $test:[to.see.if graylog() work$.'
>>>>
>>>> "tokens" : [ {
>>>>     "token" : "This",
>>>>     "start_offset" : 0,
>>>>     "end_offset" : 4,
>>>>     "type" : "word",
>>>>     "position" : 1
>>>>   }, {
>>>>     "token" : "is",
>>>>     "start_offset" : 5,
>>>>     "end_offset" : 7,
>>>>     "type" : "word",
>>>>     "position" : 2
>>>>   }, {
>>>>     "token" : "a",
>>>>     "start_offset" : 8,
>>>>     "end_offset" : 9,
>>>>     "type" : "word",
>>>>     "position" : 3
>>>>   }, {
>>>>     "token" : "$test:[to.see.if",
>>>>     "start_offset" : 10,
>>>>     "end_offset" : 26,
>>>>     "type" : "word",
>>>>     "position" : 4
>>>>   }, {
>>>>     "token" : "graylog()",
>>>>     "start_offset" : 27,
>>>>     "end_offset" : 36,
>>>>     "type" : "word",
>>>>     "position" : 5
>>>>   }, {
>>>>     "token" : "work$.",
>>>>     "start_offset" : 37,
>>>>     "end_offset" : 43,
>>>>     "type" : "word",
>>>>     "position" : 6
>>>>   } ]
>>>> }
>>>>
>>>> I feel like I am really close to an answer here.  It appears that there 
>>>> is something wrong with my index mapping/settings.
>>>>
>>>> Sincerely,
>>>>
>>>> On Tuesday, May 3, 2016 at 3:51:49 AM UTC-4, Jochen Schalanda wrote:
>>>>>
>>>>> Hi Dilip,
>>>>>
>>>>> are you 100% sure that the message is in a new index, that the index 
>>>>> template/mapping was properly applied (see 
>>>>> https://www.elastic.co/guide/en/elasticsearch/reference/1.7/indices-get-mapping.html),
>>>>>  
>>>>> and that it is the "message" field you were looking for (and not 
>>>>> "full_message" or another field)?
>>>>>
>>>>> Cheers,
>>>>> Jochen
>>>>>
>>>>> On Monday, 2 May 2016 18:57:40 UTC+2, Dilip Muthukrishnan wrote:
>>>>>>
>>>>>> Hi Jochen,
>>>>>>
>>>>>> Thanks for your reply.  I'm using graylog-1.3.4 (server).  I removed 
>>>>>> and added an updated version of the "graylog-internal" template and then 
>>>>>> cycled the deflector through the web interface.  The new index mapping 
>>>>>> reflects the changes:
>>>>>>
>>>>>> "message" : {
>>>>>>    "type" : "string",
>>>>>>    "analyzer" : "whitespace"
>>>>>> }
>>>>>>
>>>>>>
>>>>>> However, it doesn't appear to be reflected in the search.  This 
>>>>>> message is from the latest index but based on this tokenization, it 
>>>>>> appears 
>>>>>> to still be using the old "standard analyzer":
>>>>>>
>>>>>> 02.05.2016 12:47:33.488 *ERROR* [Shell Script Executor Thread for 
>>>>>> cpu.sh] com.day.crx.core.CRXSessionImpl session# 144563 opened (103) 
>>>>>> java.lang.Exception: Stack Trace at 
>>>>>> com.day.crx.core.CRXSessionImpl$Tracker.open(CRXSessionImpl.java:212) at 
>>>>>> com.day.crx.core.CRXSessionImpl$Tracker.<init>(CRXSessionImpl.java:205) 
>>>>>> at 
>>>>>> com.day.crx.core.CRXSessionImpl.<init>(CRXSessionImpl.java:179) at 
>>>>>> com.day.crx.core.CRXRepositoryImpl.createSessionInstance(CRXRepositoryImpl.java:911)
>>>>>>  
>>>>>> at 
>>>>>> org.apache.jackrabbit.core.RepositoryImpl.createSession(RepositoryImpl.java:959)
>>>>>>  
>>>>>> at 
>>>>>> org.apache.jackrabbit.core.SessionFactory.createAdminSession(SessionFactory.java:42)
>>>>>>  
>>>>>> at 
>>>>>> com.day.crx.sling.server.impl.SlingRepositoryWrapper.loginAdministrative(SlingRepositoryWrapper.java:76)
>>>>>>  
>>>>>> at 
>>>>>> com.adobe.granite.monitoring.impl.ShellScriptExecutorImpl.extractScript(ShellScriptExecutorImpl.java:161)
>>>>>>  
>>>>>> at 
>>>>>> com.adobe.granite.monitoring.impl.ShellScriptExecutorImpl.execute(ShellScriptExecutorImpl.java:114)
>>>>>>  
>>>>>> at 
>>>>>> com.adobe.granite.monitoring.impl.ScriptMBean.invoke(ScriptMBean.java:99)
>>>>>>  
>>>>>> at 
>>>>>> com.adobe.granite.monitoring.impl.ScriptMBean.invoke(ScriptMBean.java:158)
>>>>>>  
>>>>>> at 
>>>>>> com.adobe.granite.monitoring.impl.ScriptConfigImpl$ExecutionThread.run(ScriptConfigImpl.java:208)
>>>>>>  
>>>>>> at java.lang.Thread.run(Thread.java:662)
>>>>>>
>>>>>>
>>>>>> Field terms: 02.05.2016124733.488errorshellscriptexecutorthreadfor
>>>>>> cpu.shcom.day.crx.core.crxsessionimplsession144563opened103
>>>>>> java.lang.exceptionstacktraceattracker.opencrxsessionimpl.java212
>>>>>> trackerinit205179
>>>>>> com.day.crx.core.crxrepositoryimpl.createsessioninstance
>>>>>> crxrepositoryimpl.java911
>>>>>> org.apache.jackrabbit.core.repositoryimpl.createsession
>>>>>> repositoryimpl.java959
>>>>>> org.apache.jackrabbit.core.sessionfactory.createadminsession
>>>>>> sessionfactory.java42
>>>>>> com.day.crx.sling.server.impl.slingrepositorywrapper.loginadministrative
>>>>>> slingrepositorywrapper.java76
>>>>>> com.adobe.granite.monitoring.impl.shellscriptexecutorimpl.extractscript
>>>>>> shellscriptexecutorimpl.java161
>>>>>> com.adobe.granite.monitoring.impl.shellscriptexecutorimpl.execute114
>>>>>> com.adobe.granite.monitoring.impl.scriptmbean.invokescriptmbean.java
>>>>>> 99158com.adobe.granite.monitoring.impl.scriptconfigimpl
>>>>>> executionthread.runscriptconfigimpl.java208java.lang.thread.run
>>>>>> thread.java662
>>>>>>
>>>>>> As you can see, it has been stripped of various characters like 
>>>>>> colons and parentheses.
>>>>>>
>>>>>>
>>>>>> On Monday, May 2, 2016 at 12:36:38 PM UTC-4, Jochen Schalanda wrote:
>>>>>>>
>>>>>>> Hi Dilip,
>>>>>>>
>>>>>>> the index mapping of Graylog is applied by the means of an index 
>>>>>>> template. In Graylog 2.0.0, the index template will automatically be 
>>>>>>> updated but in older versions you'll have to remove the index template 
>>>>>>> yourself for it to be recreated by Graylog.
>>>>>>>
>>>>>>> See 
>>>>>>> https://www.elastic.co/guide/en/elasticsearch/reference/1.7/indices-templates.html
>>>>>>>  
>>>>>>> for details.
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Jochen
>>>>>>>
>>>>>>> On Thursday, 28 April 2016 21:42:23 UTC+2, Dilip Muthukrishnan wrote:
>>>>>>>>
>>>>>>>> I'm trying to change the analyzer from "standard" to "whitespace". 
>>>>>>>>  I've set the following property in my Graylog server configuration:
>>>>>>>>
>>>>>>>> elasticsearch_analyzer = whitespace
>>>>>>>>
>>>>>>>> It states that my change will be applied to new indices so I 
>>>>>>>> manually cycled the deflector so that it is now pointing to graylog2_1 
>>>>>>>> (previously graylog2_0).  However, the new index still uses the 
>>>>>>>> "standard" 
>>>>>>>> analyzer based on the mapping in Elasticsearch:
>>>>>>>>
>>>>>>>> "message" : {
>>>>>>>>             "type" : "string",
>>>>>>>>             "analyzer" : "standard"
>>>>>>>>           },
>>>>>>>>
>>>>>>>>
>>>>>>>> How do I change the analyzer?
>>>>>>>>
>>>>>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"Graylog Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/graylog2/c72c51fd-b752-4802-80eb-1f33cd284060%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to