Use the Analyze API to view what tokens are being generated? Keep it simple at first (maybe remove shingles) and build up as you encounter more edge-cases. What kind of query are you using?
-- Ivan On Thu, Aug 28, 2014 at 2:05 AM, Marc <mn.off...@googlemail.com> wrote: > Hi Ivan, > > thanks for the help. Now it works almost... ;) > I have used the following: > "analysis": { > "analyzer": { > "msg_excp_analyzer": { > "type": "custom", > "tokenizer": "whitespace", > "filters": ["split-up", > "lowercase", > "shingle", > "ascii-folding"] > } > }, > "filter": { > "split-up": { > "type": "word_delimiter", > "preserve_original": "true", > "catenate_all": "true", > "type_table": { > "$": "DIGIT", > "%": "DIGIT", > ".": "DIGIT", > ",": "DIGIT", > ":": "DIGIT", > "/": "DIGIT", > "\\": "DIGIT", > "=": "DIGIT", > "&": "DIGIT", > "(": "DIGIT", > ")": "DIGIT", > "<": "DIGIT", > ">": "DIGIT", > "\\U+000A": "DIGIT" > } > }, > "ascii-folding": { > "type": "asciifolding", > "preserve_original": true > } > } > If the above is wrong or not reasonable, please feel free to criticize! > > > Now the only thing that does not work is searching for subwords of > concatenations with".". > Having log Service=MyMDB.onMessage appId=cs Times=Me:22/Total:22 (updated > attributes=gps_lng: 183731222/ gps_lat: 289309222/ ) I cannot search for > MyMDB or onMessage; only MyMDB.onMessage will work. > > Anymore Ideas? > > Cheers, > Marc > > > > On Wednesday, August 27, 2014 9:20:49 AM UTC+2, Ivan Brusic wrote: > >> Off the top of my head, I would use a custom analyzer with a whitespace >> tokenizer and a word delimiter filter (preserving the original tokens as >> well). Perhaps a shingle filter to create bigrams. Or better yet a pattern >> tokenizer with spaces and parenthesis. >> >> Cheers, >> >> Ivan >> >> >> On Tue, Aug 26, 2014 at 11:57 PM, Marc <mn.o...@googlemail.com> wrote: >> >>> Hi, >>> >>> I have quiet a simple scenario that already gives me a headache for >>> quiet a while. >>> I have one Field which is quiet big and full of special characters like >>> (,),=,:,",' digits and text. >>> Example: >>> "msg" : "Service=MyMDB.onMessage appId=cs Times=Me:22/Total:22 >>> (updated attributes=gps_lng: 183731222/ gps_lat: 289309222/ )" >>> I essentially want to be able to search this things using text, >>> wildcards etc. >>> So far I have tried not analyzing the content and using the wildcard >>> search and it doesn't work very well. >>> Using different tokenizers and the query_string query also only works to >>> a certain degree. >>> For example I want to be able to serach for following expressions: >>> Service >>> MyMDB >>> onMessage >>> MyMDB.onMessage >>> appId=cs AND Times=Me:22 >>> >>> and other possible permutations. >>> What is a correct setup?! I simply can't find a solution... >>> >>> ps.: the data is imported to elasticsearch using logstash. We do acces >>> the data using the java api (all software latest versions). >>> >>> >>> Cheeers, >>> Marc >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "elasticsearch" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to elasticsearc...@googlegroups.com. >>> >>> To view this discussion on the web visit https://groups.google.com/d/ >>> msgid/elasticsearch/ada9c759-41e0-46ad-9941-3a0f2fb7c122% >>> 40googlegroups.com >>> <https://groups.google.com/d/msgid/elasticsearch/ada9c759-41e0-46ad-9941-3a0f2fb7c122%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to elasticsearch+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/a4350999-f089-4b52-bccd-d10821630066%40googlegroups.com > <https://groups.google.com/d/msgid/elasticsearch/a4350999-f089-4b52-bccd-d10821630066%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBbGG0fZ%2BGgwpcRqdmtnEeFoOFUu53P%3DZ34sLBrq39Lbw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.