Hi Preetam,

On 07/14/2008 at 1:40 PM, Preetam Rao wrote:
> Is there a query in Lucene which matches sub phrases ?
> 
[snip]
> 
> I was redirected to Shingle filter which is a token filter
> that spits out n-grams. But it does not seem to be best solution
> since one does not know in advance what n in n-grams should be.

You could guess at the useful range, though, and then have ((max n)-(min n)+1) 
fields, scaling the boost for each with the corresponding value of n.  

Just using 2-grams could be good enough, since the longer the sub-phrase match, 
the more matching 2-grams.

> Also it means one has to get all these bi grams and then construct
> a boolean OR query which is not very efficient either.

In terms of your requirements, though, I think you're stuck with this 
inefficiency, no matter what solution you end up with; you need to do some form 
of term combination in your queries.  And the ShingleFilter approach doesn't 
compare badly here, since positions for phrase queries don't have to be looked 
up during scoring.  

If index space efficiency is a concern, though, the one-field-per-value-of-n 
solution I mentioned above could pose a problem.

Steve

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to