Re: Preventing phrase queries from matching across lines

Eric Jain Sat, 29 Apr 2006 05:28:25 -0700

Erik Hatcher wrote:

On Apr 28, 2006, at 5:35 AM, Eric Jain wrote:
What is the best way to prevent a phrase query such as "eggs white"matching "fried eggs\nwhite snow"?
Two possibilities I have thought about:

1. Replace all line breaks with a special string, e.g. "newline".
2. Have an analyzer somehow increment the position of a term for eachline break it encounters.
Latter seems a bit more complicated to implement, but it would also bemore efficient, right? Or are there better options?
#2 shouldn't be too hard to implement, but you'll need to catch newlines in the initial tokenizer. I'm not sure about the efficiency, bothoptions would require a tokenizer detecting new lines and eitherinjecting a new term or setting a flag such that the next term gets aposition increment bump.

Thanks, #2 turned out to be easier to implement than expected. I shouldhave precised that the "efficiency" I was concerned about was not theefficiency of the tokenization, but the impact of having all thoseadditional "newline" term (positions) in the index.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Preventing phrase queries from matching across lines

Reply via email to