One thing to check is whether the synonyms are configured as bidirectional, or which direction they go (eg is "a b" being expanded to "ab" but "ab" is not being expanded to "a b"??)
On Wed, Mar 5, 2025 at 2:20 PM Mikhail Khludnev <m...@apache.org> wrote: > > Hello Trevor. > > Maintaining such a synonym map is too much of a burden. > One idea: sticks words together with "" separator with > https://lucene.apache.org/core/8_0_0/analyzers-common/org/apache/lucene/analysis/shingle/ShingleFilter.html > Another idea, the opposite breaks user's words via dictionary > https://lucene.apache.org/core/6_6_1/analyzers-common/org/apache/lucene/analysis/compound/DictionaryCompoundWordTokenFilter.html > However, it's actually a suggester's duty > https://lucene.apache.org/core//8_0_0/suggest/org/apache/lucene/search/spell/WordBreakSpellChecker.html > however it's aside of the main search flow. > > > On Wed, Mar 5, 2025 at 5:28 PM Trevor Nicholls <tre...@castingthevoid.com> > wrote: > > > I don't know if I have completely the wrong idea or not, hopefully somebody > > can point out where I have got this wrong > > > > > > > > I am indexing technical documentation; the content contains strings like > > "http_proxy_server". When building the index my analyzer breaks this into > > the tokens "http", "proxy" and "server". It generatees the same tokens for > > "http.proxy.server"; constructions like this are also common in the > > documents. > > > > > > > > At the moment the application is using Lucene 8.6.3. > > > > > > > > If the document contains "http_proxy_server" the user can search for "http > > proxy server", "http.proxy.server" or "http proxy server" and all will > > match. > > > > > > > > However, I am trying to construct the index and the search so that if the > > user searches for e.g. "http proxyserver" they also find a match. I thought > > it would be sufficient to add an entry to the synonym map specifying that > > "http proxy" and "httpproxy" are synonyms, and likewise "proxy server" and > > "proxyserver". (When adding multiple-word phrases the spaces are replaced > > by > > SynonymMap.WORD_SEPARATOR). > > > > > > > > The analyzer incorporates the synonym map when building the index, but not > > when searching - the synonyms (both words and phrases) should already be in > > the index so a user's search pattern should not need to be extended by > > them. > > > > > > > > Unfortunately this doesn't appear to be working as I expected. If a user > > searches for "httpproxy" or "proxyserver" nothing is matched. > > > > When I print the tokens in the stream emitted by the analyser, I can see > > all > > the word for word synonyms output (e.g. if the content contains "license", > > the emerging tokens include both "licence" and "license"), but the phrase > > substitutions are not. "http", "proxy" and "server " are there, but none of > > the conjunctions appear. > > > > > > > > I don't think synonym replacement should be occurring at search time, if > > only for performance reasons, but what have I missed in how this should > > work? Am I chasing the impossible dream? > > > > > > > > cheers > > > > T > > > > > > > > > > > > > > -- > Sincerely yours > Mikhail Khludnev --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org