Thanks everyone for the insight. I guess I'll use BooleanQuery then.

There is also a caveat I noticed (not sure if it's an issue or not), which
is slightly different from the mentioned thread. When I have a multi-word
synonym, let say "wifi router" and "internet device". Then using
SynonymGraphFilter at query time (when building the SynonymMap I already
escaped space with the backslash) would produce this TokenStream for a
query of "wifi router"

"wifi" (PositionIncrement=1,PositionLength=1), "internet"
(PositionIncrement=0,PositionLength=1), "router"
(PositionIncrement=1,PositionLength=1), "device"
(PositionIncrement=0,PositionLength=1)

This has the same effect as if I had 2 synonyms: "wifi"/"internet" and
"router"/"device". If I convert this to a BooleanQuery it would become
("wifi" OR "internet") AND ("router" OR "device"), but what I would like to
achieve is ("wifi" AND "router") OR ("internet" AND "device")

I'm curious if there would be some workaround for this case

Thanks,
Anh Dung Bui


On Thu, Dec 29, 2022 at 4:56 AM Michael Wechner <michael.wech...@wyona.com>
wrote:

> Hi Anh
>
> The following Stackoverflow link might help
>
>
> https://stackoverflow.com/questions/73240494/can-someone-assist-me-with-a-multi-word-synonym-problem-in-lucene
>
> The following thread seems to confirm, that escaping the space with a
> backslash does not help
>
> https://lists.apache.org/list?java-user@lucene.apache.org:2022-3
>
> HTH
>
> Michael
>
>
> Am 27.12.22 um 20:22 schrieb Anh Dũng Bùi:
> > Hi Lucene users,
> >
> > I recently came across SynonymQuery and found out that it only supports
> > single-term synonyms (since it accepts a list of Term which will be
> > considered as synonyms). We have some multi-term synonyms like "internet
> > device" <-> "wifi router" or "dns" <-> "domain name service". Am I right
> > that I need to use something like a BooleanQuery for these cases?
> >
> > I have 2 other follow-up questions:
> > - Does SynonymQuery have any advantage over BooleanQuery? Or is it only
> > different in how scores are computed? As I understand SynonymWeight will
> > consider all terms as exactly the same while BooleanQuery will favor the
> > documents with more matched terms.
> > - Is it worth it to support multi-term synonyms in SynonymQuery? My
> feeling
> > is that it's better to just use BooleanQuery in those cases, since to
> > support multi-term synonyms it needs to accept a list of Query, which
> would
> > make it behave like a BooleanQuery. Also how scoring works with
> multi-term
> > is another problem.
> >
> > Thanks & Regards!
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

Reply via email to