Hi all,
(scroll to bottom for question)
I was setting up a simple web app to play around with phonetic filters.
The idea is simple, I just create a document for each word in the
English dictionary, each document containing a single search field
holding the value after it is preprocessed using the following analyzer
def (in our own dsl syntax, which gets transformed to java):
analyzer soundslike{
tokenizer = KeywordTokenizer
tokenfilter = LowerCaseFilter
tokenfilter = PhoneticFilter(encoder="DoubleMetaphone", inject="true")
}
I can run the web app and I get results that indeed (in some way) sound
like the original query term.
But what confuses me is the ranking of the results, knowing that I set
the inject param to true. If I search for the query term 'compete', the
parsed query becomes '(value:KMPT value:compete)', and therefore I
expect the word 'compete' to be ranked highest in the list than any
other word.... but this wasn't the case.
Looking further at the explanation of results, I saw that the term
'compete' in the parsed query is totally absent, and only the phonetic
encoding seems affect the ranking:
* COMPETITOR
o 4.368826 = (MATCH) sum of:
+ 4.368826 = (MATCH) weight(value:KMPT in 3174), product of:
# 0.52838135 = queryWeight(value:KMPT), product of:
* 8.26832 = idf(docFreq=150, maxDocs=216555)
* 0.063904315 = queryNorm
# 8.26832 = (MATCH) fieldWeight(value:KMPT in 3174),
product of:
* 1.0 = tf(termFreq(value:KMPT)=1)
* 8.26832 = idf(docFreq=150, maxDocs=216555)
* 1.0 = fieldNorm(field=value, doc=3174)
The next thing I did was running our friend Luke. In Luke, I opened the
documents tab, and started iterating over some terms for the field
'value' until I found 'compete'. When I hit 'Show All Docs', the search
tab opens and it displays the one and only document holding this value
(i.e. the document representing the word 'compete'). It shows the query:
'value:compete '. Then, when I hit the search button again (query is
still 'value:compete '), it says that there are no results !?
Probably, the 'Show All Docs' button does something different than
performing a query using the search tab in Luke.
Q: Can somebody explain why the injected original terms seem to get
ignored at query time? Or may it be related to the name of the search
field ('value'), or something else?
We use Lucene 3.1 with SOLR analyzers (by Hibernate Search 3.4.2).
-Elmer