Hi Dilip,

are there any other conflicting index templates/mappings in your 
Elasticsearch cluster?

Other than that, the index mapping for graylog2_3 is looking fine and ES 
should use the whitespace analyzer for messages indexed into this index.

Cheers,
Jochen

On Friday, 6 May 2016 22:01:42 UTC+2, Dilip Muthukrishnan wrote:
>
> Hi Jochen,
>
> I'm still stuck on this one.  Any help would be appreciated.  Thanks.
>
> Sincerely,
>
> Dilip M.
>
> On Tuesday, May 3, 2016 at 9:32:37 AM UTC-4, Dilip Muthukrishnan wrote:
>>
>> Hi Jochen,
>>
>> Here's what my "graylog-internal" template currently looks like (as seen 
>> via the Elasticsearch API):
>>
>> {
>>   "graylog-internal" : {
>>     "order" : 0,
>>     "template" : "graylog2_*",
>>     "settings" : { },
>>     "mappings" : {
>>       "message" : {
>>         "_source" : {
>>           "compress" : true,
>>           "enabled" : true
>>         },
>>         "dynamic_templates" : [ {
>>           "internal_fields" : {
>>             "mapping" : {
>>               "index" : "not_analyzed",
>>               "doc_values" : true
>>             },
>>             "match" : "gl2_*"
>>           }
>>         }, {
>>           "store_generic" : {
>>             "mapping" : {
>>               "index" : "not_analyzed"
>>             },
>>             "match" : "*"
>>           }
>>         } ],
>>         "_ttl" : {
>>           "enabled" : true
>>         },
>>         "properties" : {
>>           "message" : {
>>             "index" : "analyzed",
>>             "analyzer" : "whitespace",
>>             "type" : "string"
>>           },
>>           "timestamp" : {
>>             "format" : "yyyy-MM-dd HH:mm:ss.SSS",
>>             "doc_values" : true,
>>             "type" : "date"
>>           },
>>           "source" : {
>>             "index" : "analyzed",
>>             "analyzer" : "analyzer_keyword",
>>             "type" : "string"
>>           },
>>           "full_message" : {
>>             "index" : "analyzed",
>>             "analyzer" : "whitespace",
>>             "type" : "string"
>>           }
>>         }
>>       }
>>     },
>>     "aliases" : { }
>>   }
>> }
>>
>>
>> Here's what my graylog2_3 index currently looks like (as seen via the 
>> Elasticsearch API):
>>
>> {
>>   "graylog2_3" : {
>>     "aliases" : {
>>       "graylog2_deflector" : { }
>>     },
>>     "mappings" : {
>>       "message" : {
>>         "dynamic_templates" : [ {
>>           "internal_fields" : {
>>             "mapping" : {
>>               "index" : "not_analyzed",
>>               "doc_values" : true
>>             },
>>             "match" : "gl2_*"
>>           }
>>         }, {
>>           "store_generic" : {
>>             "mapping" : {
>>               "index" : "not_analyzed"
>>             },
>>             "match" : "*"
>>           }
>>         } ],
>>         "_ttl" : {
>>           "enabled" : true
>>         },
>>         "_source" : {
>>           "compress" : true
>>         },
>>         "properties" : {
>>           "full_message" : {
>>             "type" : "string",
>>             "analyzer" : "whitespace"
>>           },
>>           "gl2_remote_ip" : {
>>             "type" : "string",
>>             "index" : "not_analyzed",
>>             "doc_values" : true
>>           },
>>           "gl2_remote_port" : {
>>             "type" : "long",
>>             "doc_values" : true
>>           },
>>           "gl2_source_collector" : {
>>             "type" : "string",
>>             "index" : "not_analyzed",
>>             "doc_values" : true
>>           },
>>           "gl2_source_collector_input" : {
>>             "type" : "string",
>>             "index" : "not_analyzed",
>>             "doc_values" : true
>>           },
>>           "gl2_source_input" : {
>>             "type" : "string",
>>             "index" : "not_analyzed",
>>             "doc_values" : true
>>           },
>>           "gl2_source_node" : {
>>             "type" : "string",
>>             "index" : "not_analyzed",
>>             "doc_values" : true
>>           },
>>           "level" : {
>>             "type" : "string",
>>             "index" : "not_analyzed"
>>           },
>>           "message" : {
>>             "type" : "string",
>>             "analyzer" : "whitespace"
>>           },
>>           "source" : {
>>             "type" : "string",
>>             "analyzer" : "analyzer_keyword"
>>           },
>>           "source_file" : {
>>             "type" : "string",
>>             "index" : "not_analyzed"
>>           },
>>           "timestamp" : {
>>             "type" : "date",
>>             "doc_values" : true,
>>             "format" : "yyyy-MM-dd HH:mm:ss.SSS"
>>           },
>>           "version" : {
>>             "type" : "string",
>>             "index" : "not_analyzed"
>>           }
>>         }
>>       }
>>     },
>>     "settings" : {
>>       "index" : {
>>         "creation_date" : "1462197971182",
>>         "uuid" : "ylBuS8y3SBKRYMyLuMWApg",
>>         "analysis" : {
>>           "analyzer" : {
>>             "analyzer_keyword" : {
>>               "filter" : "lowercase",
>>               "tokenizer" : "keyword"
>>             }
>>           }
>>         },
>>         "number_of_replicas" : "0",
>>         "number_of_shards" : "4",
>>         "version" : {
>>           "created" : "1070399"
>>         }
>>       }
>>     },
>>     "warmers" : { }
>>   }
>> }
>>
>>
>> After cycling the deflector so that it points to the new index, 
>> graylog2_3, I proceeded to delete my old indices.
>>
>> Using the Graylog API browser, I tried to tokenize a random string (This 
>> is a $test:[to.see.if graylog() work$.):
>>
>>
>> http://vtor-lx-tomcat-d01:12900/messages/graylog2_3/analyze?string=This%20is%20a%20%24test%3A%5Bto.see.if%20graylog()%20work%24%5D.&pretty=true
>>
>> {
>>   "tokens" : [ "this", "is", "a", "test", "to.see.if", "graylog", "work" ]
>> }
>>
>>
>> This makes sense because if I attempt to tokenize the same string via 
>> Elasticsearch (using the same index), I get the same result:
>>
>> curl 'vtor-lx-tomcat-d01:9200/graylog2_3/_analyze?pretty=true' -d 'This 
>> is a $test:[to.see.if graylog() work$.'
>>
>> "tokens" : [ {
>>     "token" : "this",
>>     "start_offset" : 0,
>>     "end_offset" : 4,
>>     "type" : "<ALPHANUM>",
>>     "position" : 1
>>   }, {
>>     "token" : "is",
>>     "start_offset" : 5,
>>     "end_offset" : 7,
>>     "type" : "<ALPHANUM>",
>>     "position" : 2
>>   }, {
>>     "token" : "a",
>>     "start_offset" : 8,
>>     "end_offset" : 9,
>>     "type" : "<ALPHANUM>",
>>     "position" : 3
>>   }, {
>>     "token" : "test",
>>     "start_offset" : 11,
>>     "end_offset" : 15,
>>     "type" : "<ALPHANUM>",
>>     "position" : 4
>>   }, {
>>     "token" : "to.see.if",
>>     "start_offset" : 17,
>>     "end_offset" : 26,
>>     "type" : "<ALPHANUM>",
>>     "position" : 5
>>   }, {
>>     "token" : "graylog",
>>     "start_offset" : 27,
>>     "end_offset" : 34,
>>     "type" : "<ALPHANUM>",
>>     "position" : 6
>>   }, {
>>     "token" : "work",
>>     "start_offset" : 37,
>>     "end_offset" : 41,
>>     "type" : "<ALPHANUM>",
>>     "position" : 7
>>   } ]
>> }
>>
>> However, without specifying the index in Elasticsearch, I get the result 
>> that I am looking for:
>>
>> curl 'vtor-lx-tomcat-d01:9200/_analyze?analyzer=whitespace&pretty=true' 
>> -d 'This is a $test:[to.see.if graylog() work$.'
>>
>> "tokens" : [ {
>>     "token" : "This",
>>     "start_offset" : 0,
>>     "end_offset" : 4,
>>     "type" : "word",
>>     "position" : 1
>>   }, {
>>     "token" : "is",
>>     "start_offset" : 5,
>>     "end_offset" : 7,
>>     "type" : "word",
>>     "position" : 2
>>   }, {
>>     "token" : "a",
>>     "start_offset" : 8,
>>     "end_offset" : 9,
>>     "type" : "word",
>>     "position" : 3
>>   }, {
>>     "token" : "$test:[to.see.if",
>>     "start_offset" : 10,
>>     "end_offset" : 26,
>>     "type" : "word",
>>     "position" : 4
>>   }, {
>>     "token" : "graylog()",
>>     "start_offset" : 27,
>>     "end_offset" : 36,
>>     "type" : "word",
>>     "position" : 5
>>   }, {
>>     "token" : "work$.",
>>     "start_offset" : 37,
>>     "end_offset" : 43,
>>     "type" : "word",
>>     "position" : 6
>>   } ]
>> }
>>
>> I feel like I am really close to an answer here.  It appears that there 
>> is something wrong with my index mapping/settings.
>>
>> Sincerely,
>>
>> On Tuesday, May 3, 2016 at 3:51:49 AM UTC-4, Jochen Schalanda wrote:
>>>
>>> Hi Dilip,
>>>
>>> are you 100% sure that the message is in a new index, that the index 
>>> template/mapping was properly applied (see 
>>> https://www.elastic.co/guide/en/elasticsearch/reference/1.7/indices-get-mapping.html),
>>>  
>>> and that it is the "message" field you were looking for (and not 
>>> "full_message" or another field)?
>>>
>>> Cheers,
>>> Jochen
>>>
>>> On Monday, 2 May 2016 18:57:40 UTC+2, Dilip Muthukrishnan wrote:
>>>>
>>>> Hi Jochen,
>>>>
>>>> Thanks for your reply.  I'm using graylog-1.3.4 (server).  I removed 
>>>> and added an updated version of the "graylog-internal" template and then 
>>>> cycled the deflector through the web interface.  The new index mapping 
>>>> reflects the changes:
>>>>
>>>> "message" : {
>>>>    "type" : "string",
>>>>    "analyzer" : "whitespace"
>>>> }
>>>>
>>>>
>>>> However, it doesn't appear to be reflected in the search.  This message 
>>>> is from the latest index but based on this tokenization, it appears to 
>>>> still be using the old "standard analyzer":
>>>>
>>>> 02.05.2016 12:47:33.488 *ERROR* [Shell Script Executor Thread for 
>>>> cpu.sh] com.day.crx.core.CRXSessionImpl session# 144563 opened (103) 
>>>> java.lang.Exception: Stack Trace at 
>>>> com.day.crx.core.CRXSessionImpl$Tracker.open(CRXSessionImpl.java:212) at 
>>>> com.day.crx.core.CRXSessionImpl$Tracker.<init>(CRXSessionImpl.java:205) at 
>>>> com.day.crx.core.CRXSessionImpl.<init>(CRXSessionImpl.java:179) at 
>>>> com.day.crx.core.CRXRepositoryImpl.createSessionInstance(CRXRepositoryImpl.java:911)
>>>>  
>>>> at 
>>>> org.apache.jackrabbit.core.RepositoryImpl.createSession(RepositoryImpl.java:959)
>>>>  
>>>> at 
>>>> org.apache.jackrabbit.core.SessionFactory.createAdminSession(SessionFactory.java:42)
>>>>  
>>>> at 
>>>> com.day.crx.sling.server.impl.SlingRepositoryWrapper.loginAdministrative(SlingRepositoryWrapper.java:76)
>>>>  
>>>> at 
>>>> com.adobe.granite.monitoring.impl.ShellScriptExecutorImpl.extractScript(ShellScriptExecutorImpl.java:161)
>>>>  
>>>> at 
>>>> com.adobe.granite.monitoring.impl.ShellScriptExecutorImpl.execute(ShellScriptExecutorImpl.java:114)
>>>>  
>>>> at 
>>>> com.adobe.granite.monitoring.impl.ScriptMBean.invoke(ScriptMBean.java:99) 
>>>> at 
>>>> com.adobe.granite.monitoring.impl.ScriptMBean.invoke(ScriptMBean.java:158) 
>>>> at 
>>>> com.adobe.granite.monitoring.impl.ScriptConfigImpl$ExecutionThread.run(ScriptConfigImpl.java:208)
>>>>  
>>>> at java.lang.Thread.run(Thread.java:662)
>>>>
>>>>
>>>> Field terms: 02.05.2016124733.488errorshellscriptexecutorthreadfor
>>>> cpu.shcom.day.crx.core.crxsessionimplsession144563opened103
>>>> java.lang.exceptionstacktraceattracker.opencrxsessionimpl.java212
>>>> trackerinit205179
>>>> com.day.crx.core.crxrepositoryimpl.createsessioninstance
>>>> crxrepositoryimpl.java911
>>>> org.apache.jackrabbit.core.repositoryimpl.createsession
>>>> repositoryimpl.java959
>>>> org.apache.jackrabbit.core.sessionfactory.createadminsession
>>>> sessionfactory.java42
>>>> com.day.crx.sling.server.impl.slingrepositorywrapper.loginadministrative
>>>> slingrepositorywrapper.java76
>>>> com.adobe.granite.monitoring.impl.shellscriptexecutorimpl.extractscript
>>>> shellscriptexecutorimpl.java161
>>>> com.adobe.granite.monitoring.impl.shellscriptexecutorimpl.execute114
>>>> com.adobe.granite.monitoring.impl.scriptmbean.invokescriptmbean.java99
>>>> 158com.adobe.granite.monitoring.impl.scriptconfigimpl
>>>> executionthread.runscriptconfigimpl.java208java.lang.thread.run
>>>> thread.java662
>>>>
>>>> As you can see, it has been stripped of various characters like colons 
>>>> and parentheses.
>>>>
>>>>
>>>> On Monday, May 2, 2016 at 12:36:38 PM UTC-4, Jochen Schalanda wrote:
>>>>>
>>>>> Hi Dilip,
>>>>>
>>>>> the index mapping of Graylog is applied by the means of an index 
>>>>> template. In Graylog 2.0.0, the index template will automatically be 
>>>>> updated but in older versions you'll have to remove the index template 
>>>>> yourself for it to be recreated by Graylog.
>>>>>
>>>>> See 
>>>>> https://www.elastic.co/guide/en/elasticsearch/reference/1.7/indices-templates.html
>>>>>  
>>>>> for details.
>>>>>
>>>>> Cheers,
>>>>> Jochen
>>>>>
>>>>> On Thursday, 28 April 2016 21:42:23 UTC+2, Dilip Muthukrishnan wrote:
>>>>>>
>>>>>> I'm trying to change the analyzer from "standard" to "whitespace". 
>>>>>>  I've set the following property in my Graylog server configuration:
>>>>>>
>>>>>> elasticsearch_analyzer = whitespace
>>>>>>
>>>>>> It states that my change will be applied to new indices so I manually 
>>>>>> cycled the deflector so that it is now pointing to graylog2_1 
>>>>>> (previously 
>>>>>> graylog2_0).  However, the new index still uses the "standard" analyzer 
>>>>>> based on the mapping in Elasticsearch:
>>>>>>
>>>>>> "message" : {
>>>>>>             "type" : "string",
>>>>>>             "analyzer" : "standard"
>>>>>>           },
>>>>>>
>>>>>>
>>>>>> How do I change the analyzer?
>>>>>>
>>>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"Graylog Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/graylog2/742a2b52-a961-4666-8e41-24e1b9f1ae1e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to