Use the Analyze API to view what tokens are being generated? Keep it simple
at first (maybe remove shingles) and build up as you encounter more
edge-cases. What kind of query are you using?

-- 
Ivan


On Thu, Aug 28, 2014 at 2:05 AM, Marc <mn.off...@googlemail.com> wrote:

> Hi Ivan,
>
> thanks for the help. Now it works almost... ;)
> I have used the following:
> "analysis": {
>             "analyzer": {
>                 "msg_excp_analyzer": {
>                     "type": "custom",
>                     "tokenizer": "whitespace",
>                     "filters": ["split-up",
>                     "lowercase",
>                     "shingle",
>                     "ascii-folding"]
>                 }
>             },
>             "filter": {
>                 "split-up": {
>                     "type": "word_delimiter",
>                     "preserve_original": "true",
>                     "catenate_all": "true",
>                     "type_table": {
>                         "$": "DIGIT",
>                         "%": "DIGIT",
>                         ".": "DIGIT",
>                         ",": "DIGIT",
>                         ":": "DIGIT",
>                         "/": "DIGIT",
>                         "\\": "DIGIT",
>                         "=": "DIGIT",
>                         "&": "DIGIT",
>                         "(": "DIGIT",
>                         ")": "DIGIT",
>                         "<": "DIGIT",
>                         ">": "DIGIT",
>                         "\\U+000A": "DIGIT"
>                     }
>                 },
>                 "ascii-folding": {
>                     "type": "asciifolding",
>                     "preserve_original": true
>                 }
>             }
> If the above is wrong or not reasonable, please feel free to criticize!
>
>
> Now the only thing that does not work is searching for subwords of
> concatenations with".".
> Having log Service=MyMDB.onMessage appId=cs Times=Me:22/Total:22 (updated
> attributes=gps_lng: 183731222/ gps_lat: 289309222/ ) I cannot search for
> MyMDB or onMessage; only MyMDB.onMessage will work.
>
> Anymore Ideas?
>
> Cheers,
> Marc
>
>
>
> On Wednesday, August 27, 2014 9:20:49 AM UTC+2, Ivan Brusic wrote:
>
>> Off the top of my head, I would use a custom analyzer with a whitespace
>> tokenizer and a word delimiter filter (preserving the original tokens as
>> well). Perhaps a shingle filter to create bigrams. Or better yet a pattern
>> tokenizer with spaces and parenthesis.
>>
>> Cheers,
>>
>> Ivan
>>
>>
>> On Tue, Aug 26, 2014 at 11:57 PM, Marc <mn.o...@googlemail.com> wrote:
>>
>>> Hi,
>>>
>>> I have quiet a simple scenario that already gives me a headache for
>>> quiet a while.
>>> I have one Field which is quiet big and full of special characters like
>>> (,),=,:,",' digits and text.
>>> Example:
>>>  "msg" : "Service=MyMDB.onMessage appId=cs Times=Me:22/Total:22
>>> (updated attributes=gps_lng: 183731222/ gps_lat: 289309222/ )"
>>> I essentially want to be able to search this things using text,
>>> wildcards etc.
>>> So far I have tried not analyzing the content and using the wildcard
>>> search and it doesn't work very well.
>>> Using different tokenizers and the query_string query also only works to
>>> a certain degree.
>>> For example I want to be able to serach for following expressions:
>>> Service
>>> MyMDB
>>> onMessage
>>> MyMDB.onMessage
>>> appId=cs AND Times=Me:22
>>>
>>> and other possible permutations.
>>> What is a correct setup?! I simply can't find a solution...
>>>
>>> ps.: the data is imported to elasticsearch using logstash. We do acces
>>> the data using the java api (all software latest versions).
>>>
>>>
>>> Cheeers,
>>> Marc
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearc...@googlegroups.com.
>>>
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/ada9c759-41e0-46ad-9941-3a0f2fb7c122%
>>> 40googlegroups.com
>>> <https://groups.google.com/d/msgid/elasticsearch/ada9c759-41e0-46ad-9941-3a0f2fb7c122%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/a4350999-f089-4b52-bccd-d10821630066%40googlegroups.com
> <https://groups.google.com/d/msgid/elasticsearch/a4350999-f089-4b52-bccd-d10821630066%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBbGG0fZ%2BGgwpcRqdmtnEeFoOFUu53P%3DZ34sLBrq39Lbw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to