Hi everyone,
I'm facing a curious problem.
I configured a custom analyzer this way in my settings :
{
"*index*":{
"*cluster.name*":"test-cluster",
"*client.transport.sniff*":true,
"*analysis*":{
"*filter*":{
"*french_elision*":{
"type":"elision",
"articles":[
* ...skipped...*
]
},
"*french_stop*":{
"type":"stop",
"stopwords":"_french_",
"ignore_case":true
},
"*snowball*":{
"type":"snowball",
"language":"french"
}
},
"*analyzer*":{
* "my_french":{ "type":"custom",
"tokenizer":"standard", "filter":[
"french_elision", "lowercase",
"french_stop", "snowball" ] },*
"*lower_analyzer*":{
"type":"custom",
"tokenizer":"keyword",
"filter":"lowercase"
},
"*token_analyzer*":{
"type":"custom",
"tokenizer":"whitespace"
}
}
}
}
}
My mapping declares the custom analyzer as the global analyzer for the type
'*record*', and explicitly for the '*a*' field of my records this way :
{
"*record*":{
"*_all*":{
"enabled":false
},
* "analyzer":"my_french",*
"*properties*":{
"*_uuid*":{
"type":"string",
"store":"yes",
"index":"not_analyzed"
},
"*a*":{
"*type*":"multi_field",
"*fields*":{
"a":{
"type":"string",
"store":"yes",
"index":"analyzed",
*"analyzer":"my_french" *
},
"*raw*":{
"type":"string",
"store":"no",
"index":"not_analyzed"
},
"*tokens*":{
"type":"string",
"store":"no",
"index":"analyzed",
"analyzer":"token_analyzer"
},
"*lower*":{
"type":"string",
"store":"no",
"index":"analyzed",
"analyzer":"lower_analyzer"
}
}
},
"*g_r*":{
"type":"string",
"store":"yes",
"index":"analyzed"
}
}
}
}
So here basically, i expect to see fields *a* and *g_r* to be analysed
using *my_french* analyzer:
- *a* because it is explicitly defined in the field mapping;
- *g_r* because no analyzer is defined in the field mapping, but the global
analyzer is defined to my_french.
And actually if i test the analysis process using a _analyze REST request,
it seems ok :
$ curl -XGET 'localhost:9200/test-index/_analyze?analyzer=*my_french*' -d
"*j'aime
les chevaux*"
{
"*tokens*":[
{
"token":"*aim*",
"start_offset":0,
"end_offset":6,
"type":"<ALPHANUM>",
"position":1
},
{
"token":"*cheval*",
"start_offset":11,
"end_offset":18,
"type":"<ALPHANUM>",
"position":3
}
]
}
Which is definitely what i expect of my my_french analyzer.
But when i index my data and query on it, i don't get the expected results.
So i tried executing a facet query to see what terms have been indexed for
my fields, and the result is very surprising :
Query :
{
"*query*": {
"*match*": {
"_id": "12"
}
},
"*facets*": {
"*tokens*": {
"*terms*": {
"*field*": "*a*"
}
}
}
}
This gives me the following result, which is not what i expected to see (i
expect the tokens to be returned to be *aim* and *cheval*, as resulting
from the analysis request above) :
$ curl -X POST "http://localhost:9200/test-index/_search?pretty=true" -d
'{"query": {"match": {"_id": "12"}},"facets": {"tokens": {"terms":
{"field": "a"}}}}'
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"*hits*" : {
"*total*" : 1,
"max_score" : 1.0,
"*hits*" : [ {
"_index" : "test-index",
"_type" : "record",
"_id" : "12",
"_score" : 1.0,
"_source":{"_uuid":"12","a_t":false,"a_n":false,"a":"J'aime les
chevaux","b_r":null,"b_t":false,"b_n":false,"b":1407664800000,"c_r":null,"c_t":false,"c_n":false,"c":2,"d_r":"m3","d_t":true,"d_n":false,"d":null,"e_r":null,"e_t":false,"e_n":true,"e":12,"f_r":null,"f_t":false,"f_n":false,"f":true,"g_r":"J'aime
les chevaux","g_t":false,"g_n":false,"g":12.0}
} ]
},
"*facets*" : {
"*tokens*" : {
"*_type*" : "*terms*",
"missing" : 0,
"total" : 2,
"other" : 0,
"*terms*" : [ {
"*term*" : "*j'aim*",
"count" : 1
}, {
"*term*" : "*cheval*",
"count" : 1
} ]
}
}
}
Can anyone see what is wrong, where i made a mistake, or what i am missing ?
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/9e11d3ef-b291-44d8-a08a-3d7f5740badb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.