I didn't notice that _all field turned out to be unpredictable at times. 

There are certain reasons that we don't want to (or we can't) predefine our 
mappings when creating index, that's why I used the default_indexconfiguration 
there. 

What I'm doing is to to implement a google-like search with Elasticsearch 
so I don't want to specify any field when searching. I figured out that I 
have to create another field to aggregate the terms by myself instead of 
relying on _all field.

Anyway, that was great answer and it did help me to understand my problem.

Thanks Jörg.

On Monday, 31 March 2014 21:09:06 UTC+8, Jörg Prante wrote:
>
> This is expected behavior with _all field.
>
> For demonstration I extended your gist a bit.
>
> https://gist.github.com/jprante/9891706
>
> Some hints:
>
> - custom tokenizer should be used in a field that is configured in a 
> mapping
>
> - always set both search and index analyzer for a field
>
> - avoid setting up a custom tokenizer for _all when including more than 
> one field to _all (which is the default). This will give unpredictable 
> results because tokens from many fields are merged into _all. In edge 
> cases, when a field is first for example, you may be able to produce a hit. 
> But this is pure accidentally.
>
> - when searching with q parameter, do not forget to specify field name
>
>
> Jörg
>
>
>
>
> On Mon, Mar 31, 2014 at 2:23 PM, Huy Phan <[email protected] 
> <javascript:>>wrote:
>
>> Hi Luca,
>>
>> The configuration index.analysis.analyzer.default_index is already set 
>> so I don't think there's a need to specify my mappings since I actually 
>> want to use the comma analyzer for all the fields. And from what I 
>> understand, that default_index is also applied to _all field. 
>> As you could see in my gist, I also overrode the "standard" analyzer 
>> since I doubted something went wrong with defaul_index. 
>>
>> You may ask about the default_search configuration, my query "123456" is 
>> rather simple so I don't think the default analyzer would make any changes 
>> on it (and yes, I did verify that using the Analyzer API).
>>
>> Even if there's something wrong with my settings, that still doesn't 
>> clearly explain why I got the result with the second document but not with 
>> the first one. 
>>
>>
>> On Monday, 31 March 2014 19:45:42 UTC+8, Luca Cavanna wrote:
>>>
>>> As far as I can see from your recreation you only create the analyzer 
>>> but don't associate it to your fields by specifying your mappings. Also, 
>>> when you query you don't soecify the field you want to query, thus you are 
>>> using the _all which has its own analyzer, which means that even if you had 
>>> specified the proper mappings the query would execute against a different 
>>> field with a different analyzer.
>>>
>>> On Monday, March 31, 2014 12:12:37 PM UTC+2, Huy Phan wrote:
>>>>
>>>> Hi all,
>>>>
>>>> I bumped into this weird behavior of Elasticsearch: https://gist.
>>>> github.com/huyphan/9888959<https://www.google.com/url?q=https%3A%2F%2Fgist.github.com%2Fhuyphan%2F9888959&sa=D&sntz=1&usg=AFQjCNH4SNtSUHvK2yfyGrFL2mqfyD-vIQ>
>>>>
>>>> Basically what I did is to create a comma analyzer and and use it as 
>>>> the default one. Then I indexed this document
>>>>
>>>> {
>>>>     "random_string" : "ABC,XYZ",
>>>>     "random_number" : "123456,7890123",
>>>>     "random_email"  : "[email protected],[email protected]"
>>>> }
>>>>
>>>>
>>>> Then search for it with query "123456", I got no hit. However if I did 
>>>> everything from scratch and indexed a slightly different document (it's 
>>>> actually the same doc with first field removed):
>>>>
>>>> {
>>>>     "random_number" : "123456,7890123",
>>>>     "random_email"  : "[email protected],[email protected]"
>>>> }
>>>>
>>>>  
>>>> The same old query did give me the result. I'm not sure what is the 
>>>> difference between the 2 documents that causes this behavior.
>>>>
>>>>
>>>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/9b25e5f4-22a2-48e0-8ab2-4c72f4d8d25e%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/9b25e5f4-22a2-48e0-8ab2-4c72f4d8d25e%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/12230bc6-87c0-4e42-981b-d56f3c99ef3c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to