> Looks to me like MultiPhraseQuery is getting in the way. Shingles > that begin at the same word are given the same position by > ShingleFilter, and Solr's FieldQParserPlugin creates a > MultiPhraseQuery when it encounters tokens in a query with the same > position. I think what you want is to convert queries into shingle > disjunctions (*any* matching shingle results in a hit), right?
Yes you're right Steve. thank you.
One way, i see now, to get the behaviour i want is to set the unigrams'
positionIncrement to zero instead of one.
For example in ShingleFilter.fillOutputBuffer(..) replacing the two
ocurrances of
> .setPositionIncrement(1);
with
> .setPositionIncrement(0);
Then i end up with a MultiPhraseQuery with
termArrays[0] = { list_entry_shingles:abcd
list_entry_shingles:abcd efgh
list_entry_shingles:abcd efgh ijkl
list_entry_shingles:efgh
list_entry_shingles:efgh ijkl
list_entry_shingles:ijkl }
and it works perfectly :-)
I see no way of configuring this behaviour though.
If it is possible and someone can say how this would be a real godsend.
Otherwise would a patch to ShingleFilter that offers an option
"unigramPositionIncrement" (that defaults to 1) likely be accepted into
trunk?
~mck
--
"Between two evils, I always pick the one I never tried before." Mae
West
| semb.wever.org | sesat.no | sesam.no |
signature.asc
Description: This is a digitally signed message part
