phrase search with custom TokenFilter

Embry, Clay Mon, 10 Mar 2008 11:29:22 -0700

Hi, I have written a TokenFilter which breaks up words with internal dot 
characters and adds the whole word plus the pieces as tokens in the stream. I 
am using that TokenFilter with the StandardAnalyzer to index my documents. Then 
I do searches using the StandardAnalyzer. Everything is working great except 
for some phrase searches. Here's an example:


Document string
---------------
entity-cache.size-limit

StandardAnalyzer token - position increment
-------------------------------------------

(entity,0,6,type=<alphanum>) - 1

(cache.size,7,17,type=<host>) - 1

(limit,18,23,type=<alphanum>) - 1


MyAnalyzer token - position increment
-------------------------------------

(entity,0,6,type=<alphanum>) - 1

(cache.size,7,17,type=<host>) - 1

(limit,18,23,type=<alphanum>) - 1

(cache,7,12,type=<alphanum>) - 1

(size,13,17,type=<alphanum>) - 1



Search string (StandardAnalyzer)
--------------------------------
"cache.size limit"



The search finds the doc if I use the StandardAnalyzer to index, but not if I 
use MyAnalyzer to index. Can anyone see why that would be true? The first three 
Tokens of each TokenStream are exactly the same and it looks like both would be 
found by that search phrase. Do I need to change the position offsets on my 
extra Tokens or something?



Thanks for any help.

==

Clay Embry

phrase search with custom TokenFilter

Reply via email to