I'm investigating various ways of supporting synonyms in Lucene.

One such approach that looks potentially interesting is to do a kind of "query expansion".

For example, if the user searches for "us 1888", one might expand the query as follows:

    SpanNearQuery query =
    new SpanNearQuery(
        new SpanQuery[]
        {
            new SpanOrQuery(
                new SpanTermQuery(new Term("Plaintext", "us")),
                new SpanNearQuery(
                    new SpanQuery[]
                    {
                        new SpanTermQuery(new Term("Plaintext", "united")),
                        new SpanTermQuery(new Term("Plaintext", "states"))
                    },
                    0,
                    true
                )
            ),
            new SpanTermQuery(new Term("Plaintext", "1888"))
        },
        0,
        true
    );

A couple of questions:

- Is this approach in use within the community?
- Are there "gotchas" with this approach that make it undesirable?

I've done a few quick tests wrt query performance on a test index and found that a query can indeed take 10x longer if enough synonyms are used, but if the baseline search time is around 1 ms, then 10 ms is still plently fast enough. (that said, my test was on a 70 MB index, so my 10 ms might turn into something nasty with a 7 GB index)

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to