Thanks for your suggestion Ian, but I just found out that if I replace the KeywordTokenizer with a WhitespaceTokenizer, all seems to work fine.

Just to test what happens, I created another field 'orig', using this analyzer:
analyzer KeywordLowered{
    tokenizer = KeywordTokenizer
    tokenfilter = LowerCaseFilter
}

Guess what.. exactly the same problem, also in Luke.
It finds no documents with for query:
orig:strange
While the term 'strange' is in the index for the field 'orig'.

Does anybody have a clue why documents are not matched when using the KeywordTokenizer? Remember that all queries and terms don't contain white spaces.


Thanks again.
-Elmer


On 04/25/2012 02:53 PM, Ian Lea wrote:
You seem to be quietly going round in circles, by yourself!  I suggest
a small self-contained program/test case with a RAM index created from
scratch.  You can then experiment with inject on or off and if you
still can't figure it out, post the code and hopefully someone will be
able to help you make sense of it.

Make sure you tell us what version of Lucene you are using.  If not
the latest, wouldn't hurt to try with the latest.


--
Ian.


On Wed, Apr 25, 2012 at 1:22 PM, Elmer van Chastelet
<evanchaste...@gmail.com>  wrote:
I keep replying to myself, it all gets a bit confusing.
The problem still exists and I don't understand why, and why it worked once.

I have the same behavior again as posted in my first mail:
- Inject parameter is set to true.
- The index has _no deleted documents_ and is optimized.
- The term 'compete' is in there.
- If I ask Luke to show all docs for term 'compete' it shows me the one and
only document that represents this word. But...
- If I perform the query 'value:compete' in luke again, it says there are no
results.

Here is the index I'm currently using. It contains various fields for the
available phonetic filter encoders:
https://www.box.com/s/34212e82227e102f6734

Can somebody explain this behavior? What's the real use of the inject
parameter of the PhoneticFilterFactory?

Thanks in advance.

-Elmer


On 04/25/2012 12:25 PM, Elmer van Chastelet wrote:
Problem solved. Long story short: for some reason I had deleted documents
in the index and the non-deleted documents used the phonetic filter with
inject set to false.

Works fine now :)

On 04/23/2012 09:27 PM, Elmer van Chastelet wrote:
Hi all,

(scroll to bottom for question)

I was setting up a simple web app to play around with phonetic filters.
The idea is simple, I just create a document for each word in the English
dictionary, each document containing a single search field holding the value
after it is preprocessed using the following analyzer def (in our own dsl
syntax, which gets transformed to java):

analyzer soundslike{
    tokenizer = KeywordTokenizer
    tokenfilter = LowerCaseFilter
    tokenfilter = PhoneticFilter(encoder="DoubleMetaphone", inject="true")
}

I can run the web app and I get results that indeed (in some way) sound
like the original query term.

But what confuses me is the ranking of the results, knowing that I set
the inject param to true. If I search for the query term 'compete', the
parsed query becomes '(value:KMPT value:compete)', and therefore I expect
the word 'compete' to be ranked highest in the list than any other word....
but this wasn't the case.

Looking further at the explanation of results, I saw that the term
'compete' in the parsed query is totally absent, and only the phonetic
encoding seems affect the ranking:

  * COMPETITOR
      o 4.368826 = (MATCH) sum of:
          + 4.368826 = (MATCH) weight(value:KMPT in 3174), product of:
              # 0.52838135 = queryWeight(value:KMPT), product of:
                  * 8.26832 = idf(docFreq=150, maxDocs=216555)
                  * 0.063904315 = queryNorm
              # 8.26832 = (MATCH) fieldWeight(value:KMPT in 3174),
                product of:
                  * 1.0 = tf(termFreq(value:KMPT)=1)
                  * 8.26832 = idf(docFreq=150, maxDocs=216555)
                  * 1.0 = fieldNorm(field=value, doc=3174)

The next thing I did was running our friend Luke. In Luke, I opened the
documents tab, and started iterating over some terms for the field 'value'
until I found 'compete'. When I hit 'Show All Docs', the search tab opens
and it displays the one and only document holding this value (i.e. the
document representing the word 'compete'). It shows the query:
'value:compete '. Then, when I hit the search button again (query is still
'value:compete '), it says that there are no results !?

Probably, the 'Show All Docs' button does something different than
performing a query using the search tab in Luke.

Q: Can somebody explain why the injected original terms seem to get
ignored at query time? Or may it be related to the name of the search field
('value'), or something else?

We use Lucene 3.1 with SOLR analyzers (by Hibernate Search 3.4.2).

-Elmer


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to