Hi,
I try to use *ngram* based solution as "shotgun approach" to get results
which are not covered by more precise analyzers.
Article describing this approach is for example here
<http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/ngrams-compound-words.html>
*Match* query, it is working as expected including parameter
minimum_should_match (this parameter is very important to be able to
exclude matches of only one ngram from queried word).
*1.* Below is explanation of *match* query searching word *"first"* without
filtered(Text:fir Text:irs Text:rst)->cache(_type:item)
and with "minimum_should_match": "80%"
filtered((Text:fir Text:irs Text:rst)*~2*)->cache(_type:item)
2. But if I try to use the same with *match_phrase* query and phrase *"first
second"* I'll get the same result with or without minimum_should_match
parameter
filtered(Text:"(fir irs rst) (sec eco con ond)")->cache(_type:item)
I 'am expecting (for minimum_should_match) something like
filtered(Text:"(fir irs rst)*~2* (sec eco con ond)*~3*")->cache(_type:item)
In attached file, there is complete Marvel Sense code with example
described above.
*Does anybody know if it is a bug or if it known limitation of used
technology?*
Or maybe there is another way how to achieve better phrase query results in
combination with ngram analyzer.
Thanks,
Zdenek
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0c637de8-0c27-4097-a94c-88247fa012f2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
PUT /tokenizers
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0,
"analysis": {
"filter": {
"trigram": {
"type": "ngram",
"min_gram": 3,
"max_gram": 3
}
},
"analyzer": {
"trigram": {
"tokenizer": "standard",
"filter": [
"trigram"
]
}
}
}
},
"mappings": {
"item": {
"dynamic": "false",
"properties": {
"Text": {
"type": "string",
"analyzer": "trigram"
}
}
}
}
}
GET /tokenizers/item/_validate/query?explain
{
"query": {
"match": {
"Text": {
"query": "first"
}
}
}
}
GET /tokenizers/item/_validate/query?explain
{
"query": {
"match": {
"Text": {
"query": "first",
"minimum_should_match": "80%"
}
}
}
}
GET /tokenizers/item/_validate/query?explain
{
"query": {
"match_phrase": {
"Text": {
"query": "first second"
}
}
}
}
GET /tokenizers/item/_validate/query?explain
{
"query": {
"match_phrase": {
"Text": {
"query": "first second",
"minimum_should_match": "80%"
}
}
}
}