Hi,
I try to use *ngram* based solution as "shotgun approach" to get results 
which are not covered by more precise analyzers.

Article describing this approach is for example here 
<http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/ngrams-compound-words.html>

*Match* query, it is working as expected including parameter 
minimum_should_match (this parameter is very important to be able to 
exclude matches of only one ngram from queried word).

*1.* Below is explanation of *match* query searching word *"first"* without

filtered(Text:fir Text:irs Text:rst)->cache(_type:item)

and with "minimum_should_match": "80%"

filtered((Text:fir Text:irs Text:rst)*~2*)->cache(_type:item)

2. But if I try to use the same with *match_phrase* query and phrase *"first 
second"* I'll get the same result with or without minimum_should_match 
parameter

filtered(Text:"(fir irs rst) (sec eco con ond)")->cache(_type:item)

I 'am expecting (for minimum_should_match) something like

filtered(Text:"(fir irs rst)*~2* (sec eco con ond)*~3*")->cache(_type:item)

In attached file, there is complete Marvel Sense code with example 
described above.

*Does anybody know if it is a bug or if it known limitation of used 
technology?*
Or maybe there is another way how to achieve better phrase query results in 
combination with ngram analyzer.

Thanks,
Zdenek

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0c637de8-0c27-4097-a94c-88247fa012f2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
PUT /tokenizers
{
  "settings": {
    "number_of_shards": 1, 
    "number_of_replicas": 0,
    "analysis": {
      "filter": {
        "trigram": {
          "type": "ngram",
          "min_gram": 3,
          "max_gram": 3
        }
      },
      "analyzer": {
        "trigram": {
          "tokenizer": "standard",
          "filter": [
            "trigram"
          ]
        }
        }
    }
  },
  "mappings": {
    "item": {
      "dynamic": "false",
      "properties": {
        "Text": {
          "type": "string",
          "analyzer": "trigram"
        }
      }
    }
  }
}

GET /tokenizers/item/_validate/query?explain
{
  "query": {
    "match": {
      "Text": {
        "query": "first"
      }
    }
  }
}

GET /tokenizers/item/_validate/query?explain
{
  "query": {
    "match": {
      "Text": {
        "query": "first",
        "minimum_should_match": "80%"
      }
    }
  }
}

GET /tokenizers/item/_validate/query?explain
{
  "query": {
    "match_phrase": {
      "Text": {
        "query": "first second"
      }
    }
  }
}

GET /tokenizers/item/_validate/query?explain
{
  "query": {
    "match_phrase": {
      "Text": {
        "query": "first second",
        "minimum_should_match": "80%"
      }
    }
  }
}

Reply via email to