I didn't notice that _all field turned out to be unpredictable at times. There are certain reasons that we don't want to (or we can't) predefine our mappings when creating index, that's why I used the default_indexconfiguration there.
What I'm doing is to to implement a google-like search with Elasticsearch so I don't want to specify any field when searching. I figured out that I have to create another field to aggregate the terms by myself instead of relying on _all field. Anyway, that was great answer and it did help me to understand my problem. Thanks Jörg. On Monday, 31 March 2014 21:09:06 UTC+8, Jörg Prante wrote: > > This is expected behavior with _all field. > > For demonstration I extended your gist a bit. > > https://gist.github.com/jprante/9891706 > > Some hints: > > - custom tokenizer should be used in a field that is configured in a > mapping > > - always set both search and index analyzer for a field > > - avoid setting up a custom tokenizer for _all when including more than > one field to _all (which is the default). This will give unpredictable > results because tokens from many fields are merged into _all. In edge > cases, when a field is first for example, you may be able to produce a hit. > But this is pure accidentally. > > - when searching with q parameter, do not forget to specify field name > > > Jörg > > > > > On Mon, Mar 31, 2014 at 2:23 PM, Huy Phan <[email protected] > <javascript:>>wrote: > >> Hi Luca, >> >> The configuration index.analysis.analyzer.default_index is already set >> so I don't think there's a need to specify my mappings since I actually >> want to use the comma analyzer for all the fields. And from what I >> understand, that default_index is also applied to _all field. >> As you could see in my gist, I also overrode the "standard" analyzer >> since I doubted something went wrong with defaul_index. >> >> You may ask about the default_search configuration, my query "123456" is >> rather simple so I don't think the default analyzer would make any changes >> on it (and yes, I did verify that using the Analyzer API). >> >> Even if there's something wrong with my settings, that still doesn't >> clearly explain why I got the result with the second document but not with >> the first one. >> >> >> On Monday, 31 March 2014 19:45:42 UTC+8, Luca Cavanna wrote: >>> >>> As far as I can see from your recreation you only create the analyzer >>> but don't associate it to your fields by specifying your mappings. Also, >>> when you query you don't soecify the field you want to query, thus you are >>> using the _all which has its own analyzer, which means that even if you had >>> specified the proper mappings the query would execute against a different >>> field with a different analyzer. >>> >>> On Monday, March 31, 2014 12:12:37 PM UTC+2, Huy Phan wrote: >>>> >>>> Hi all, >>>> >>>> I bumped into this weird behavior of Elasticsearch: https://gist. >>>> github.com/huyphan/9888959<https://www.google.com/url?q=https%3A%2F%2Fgist.github.com%2Fhuyphan%2F9888959&sa=D&sntz=1&usg=AFQjCNH4SNtSUHvK2yfyGrFL2mqfyD-vIQ> >>>> >>>> Basically what I did is to create a comma analyzer and and use it as >>>> the default one. Then I indexed this document >>>> >>>> { >>>> "random_string" : "ABC,XYZ", >>>> "random_number" : "123456,7890123", >>>> "random_email" : "[email protected],[email protected]" >>>> } >>>> >>>> >>>> Then search for it with query "123456", I got no hit. However if I did >>>> everything from scratch and indexed a slightly different document (it's >>>> actually the same doc with first field removed): >>>> >>>> { >>>> "random_number" : "123456,7890123", >>>> "random_email" : "[email protected],[email protected]" >>>> } >>>> >>>> >>>> The same old query did give me the result. I'm not sure what is the >>>> difference between the 2 documents that causes this behavior. >>>> >>>> >>>> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/9b25e5f4-22a2-48e0-8ab2-4c72f4d8d25e%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/9b25e5f4-22a2-48e0-8ab2-4c72f4d8d25e%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/12230bc6-87c0-4e42-981b-d56f3c99ef3c%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
