Hi everyone,

I'm facing a curious problem.

I configured a custom analyzer this way in my settings :

{  
   "*index*":{  
      "*cluster.name*":"test-cluster",
      "*client.transport.sniff*":true,
      "*analysis*":{  
         "*filter*":{  
            "*french_elision*":{  
               "type":"elision",
               "articles":[  
                 * ...skipped...*
               ]
            },
            "*french_stop*":{  
               "type":"stop",
               "stopwords":"_french_",
               "ignore_case":true
            },
            "*snowball*":{  
               "type":"snowball",
               "language":"french"
            }
         },
         "*analyzer*":{  
           








* "my_french":{                 "type":"custom",              
 "tokenizer":"standard",               "filter":[                    
"french_elision",                  "lowercase",                  
"french_stop",                  "snowball"               ]            },*
            "*lower_analyzer*":{  
               "type":"custom",
               "tokenizer":"keyword",
               "filter":"lowercase"
            },
            "*token_analyzer*":{  
               "type":"custom",
               "tokenizer":"whitespace"
            }
         }
      }
   }
}


My mapping declares the custom analyzer as the global analyzer for the type 
'*record*', and explicitly for the '*a*' field of my records this way : 

{  
   "*record*":{  
      "*_all*":{  
         "enabled":false
      },
     * "analyzer":"my_french",*
      "*properties*":{  
         "*_uuid*":{  
            "type":"string",
            "store":"yes",
            "index":"not_analyzed"
         },
         "*a*":{  
            "*type*":"multi_field",
            "*fields*":{  
               "a":{  
                  "type":"string",
                  "store":"yes",
                  "index":"analyzed",
                  *"analyzer":"my_french" *
               },
               "*raw*":{  
                  "type":"string",
                  "store":"no",
                  "index":"not_analyzed"
               },
               "*tokens*":{  
                  "type":"string",
                  "store":"no",
                  "index":"analyzed",
                  "analyzer":"token_analyzer"
               },
               "*lower*":{  
                  "type":"string",
                  "store":"no",
                  "index":"analyzed",
                  "analyzer":"lower_analyzer"
               }
            }
         },
         "*g_r*":{  
            "type":"string",
            "store":"yes",
            "index":"analyzed"
         }
      }
   }
}

So here basically, i expect to see fields *a* and *g_r* to be analysed 
using *my_french* analyzer:
- *a* because it is explicitly defined in the field mapping;
- *g_r* because no analyzer is defined in the field mapping, but the global 
analyzer is defined to my_french.

And actually if i test the analysis process using a _analyze REST request, 
it seems ok :

$ curl -XGET 'localhost:9200/test-index/_analyze?analyzer=*my_french*' -d 
"*j'aime 
les chevaux*"
{  
   "*tokens*":[  
      {  
         "token":"*aim*",
         "start_offset":0,
         "end_offset":6,
         "type":"<ALPHANUM>",
         "position":1
      },
      {  
         "token":"*cheval*",
         "start_offset":11,
         "end_offset":18,
         "type":"<ALPHANUM>",
         "position":3
      }
   ]
}

Which is definitely what i expect of my my_french analyzer.

But when i index my data and query on it, i don't get the expected results.
So i tried executing a facet query to see what terms have been indexed for 
my fields, and the result is very surprising :

Query : 

{
  "*query*": {
    "*match*": {
      "_id": "12"
    }
  },
  "*facets*": {
    "*tokens*": {
      "*terms*": {
        "*field*": "*a*"
      }
    }
  }
}

This gives me the following result, which is not what i expected to see (i 
expect the tokens to be returned to be *aim* and *cheval*, as resulting 
from the analysis request above) : 

$ curl -X POST "http://localhost:9200/test-index/_search?pretty=true"; -d 
'{"query": {"match": {"_id": "12"}},"facets": {"tokens": {"terms": 
{"field": "a"}}}}'

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "*hits*" : {
    "*total*" : 1,
    "max_score" : 1.0,
    "*hits*" : [ {
      "_index" : "test-index",
      "_type" : "record",
      "_id" : "12",
      "_score" : 1.0,
      "_source":{"_uuid":"12","a_t":false,"a_n":false,"a":"J'aime les 
chevaux","b_r":null,"b_t":false,"b_n":false,"b":1407664800000,"c_r":null,"c_t":false,"c_n":false,"c":2,"d_r":"m3","d_t":true,"d_n":false,"d":null,"e_r":null,"e_t":false,"e_n":true,"e":12,"f_r":null,"f_t":false,"f_n":false,"f":true,"g_r":"J'aime
 
les chevaux","g_t":false,"g_n":false,"g":12.0}
    } ]
  },
  "*facets*" : {
    "*tokens*" : {
      "*_type*" : "*terms*",
      "missing" : 0,
      "total" : 2,
      "other" : 0,
      "*terms*" : [ {
        "*term*" : "*j'aim*",
        "count" : 1
      }, {
        "*term*" : "*cheval*",
        "count" : 1
      } ]
    }
  }
}

Can anyone see what is wrong, where i made a mistake, or what i am missing ?


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9e11d3ef-b291-44d8-a08a-3d7f5740badb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to