I'm investigating various ways of supporting synonyms in Lucene.
One such approach that looks potentially interesting is to do a kind of
"query expansion".
For example, if the user searches for "us 1888", one might expand the
query as follows:
SpanNearQuery query =
new SpanNearQuery(
new SpanQuery[]
{
new SpanOrQuery(
new SpanTermQuery(new Term("Plaintext", "us")),
new SpanNearQuery(
new SpanQuery[]
{
new SpanTermQuery(new Term("Plaintext", "united")),
new SpanTermQuery(new Term("Plaintext", "states"))
},
0,
true
)
),
new SpanTermQuery(new Term("Plaintext", "1888"))
},
0,
true
);
A couple of questions:
- Is this approach in use within the community?
- Are there "gotchas" with this approach that make it undesirable?
I've done a few quick tests wrt query performance on a test index and
found that a query can indeed take 10x longer if enough synonyms are
used, but if the baseline search time is around 1 ms, then 10 ms is
still plently fast enough. (that said, my test was on a 70 MB index, so
my 10 ms might turn into something nasty with a 7 GB index)
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org