I know, there's a ton of documentation about the query parser whitespace
issue, and there's also a fair bit of info on the positionLengthAttribute
issue, but I seem to have stumbled upon a new issue with multi term
synonyms: it doesn't seem to play well with a bunch of tokens in the same
position.

I have a synonym filter with this expansion:
side table,end table

I can see the synonym is applied when looking at the token stream output
for "side table".  Today I decided to throw an additional synonymFilter
immediately before that one with wordnet synonym expansions.  Wordnet
expectedly bloats the tokenstream, but all of a sudden the original end
table expansion doesn't get applied.  I see "side" followed by a bunch of
tokens in the same position, followed by a couple new tokens in the next
position, followed by "table" in the same token position, followed by some
more new tokens in the same position.  Since side is still adjacent to
table in token positions, I would expect the synonym to hit.  Is this a
known issue (what's the Jira)?  The impact seems significant.  Since
wordnet is so comprehensive, it's likely going to cause this issue with
most of my multi term synonyms.  Maybe the workaround is to apply multi
term synonyms first as best is possible, although I don't know if you have
that kind of control if all your synonyms are applied by a single
SynonymFilter.

Thanks,
Ryan

Reply via email to