I am indexing some technical documentation and have been trying to add synonym matching to the searches. Actually I am adding the synonyms at index time so that any synonyms match at search time.
a. Simple synonyms (wordA = wordB) are working just fine. b. Multiple synonyms (wordA = wordB = wordC.) are almost working, but not quite. The problem appears to be that wordA and wordB are allocated the same position in the index vector, but wordC is shifted by 1 (and a wordD is shifted by 2, etc.) Thus phrase searches including one of the synonyms only work with a proximity modifier. c. Synonym phrases (wordA = wordB wordC) are not working properly. I have prepared a simple test case which can be downloaded here: https://www.dropbox.com/s/rn4np7ja4wcpodl/mydemo.zip?dl=0 (The download is ~5Mb because it includes the 3 Lucene JAR files which are required, these are from Lucene 9.2.0) Unzip the download into a directory called "mydemo" and compile & run it. The example assumes you have Ant; if you don't it is a simple enough example that you should be able to emulate the steps after reading the build.xml file. As well as the three library files, the zipfile provides four Java files, three trivial documents, a synonym list which is loaded by the indexing step, and two query lists for the search step. The three documents are almost identical; they contain a sentence which variously contains "release note" (or "release notes" or "release notice"), and "document subtree" (or "sub tree" or "sub-tree"). The synonym list contains two sets of synonyms: note,notes,notice,notification subtree,sub tree,sub-tree The query lists each contain about 10 queries. Every single query should match all three documents, but some of them do not. a. The querylist query.rn.in shows that, where a term has multiple synonyms, their positions are shifted. Thus "release note" and "release notes" match all three documents, "release notice" only matches if the search term is "release notice"~1 (because notice is the second synonym and has been shifted one position), and "release notification" only matches if the term is "release notification"~2 (because notification is the third synonym and has been shifted two positions. The command "ant rnsearch" should run these searches and show the results. b. The querylist query.st.in shows that the phrase "sub tree" is not being correctly identified as a synonym of the other two terms. The command "ant stsearch" should run these. If anyone can point to what I am doing wrong in MyAnalyzer.java I would be extremely grateful. Cheers T