Welcome John! Basically the tricky part about this issue is how Analyzer integrates into the parsing workflow: It is as hossman says on the issue.
You can edit the .jflex file so that _TERM_CHAR is defined differently and regenerate, and you will see what i mean by the tests that fail. The crux of the problem is that currently if you have +foo bar -baz, we split on whitespace, applying operators, then run the analyzer on each portion. so you get +foo, bar, -baz, then we analyze foo, bar, and baz respectively. But if you just remove the whitespace tokenization, you will get +foo bar, -baz, which is different. so to make this kind of thing work as expected, I think the analyzer would be integrated at an earlier stage here before the operators are applied, e.g. its part of the lexing process. NOTE: I definitely don't want to discourage you from tackling this issue, but I think its fair to mention there is a workaround, and thats if you can preprocess your queries yourself (maybe you dont allow all the lucene syntax to your users or something like that), you can escape the whitespace yourself such as rain\ coat, and I think your synonyms will work as expected. On Sun, Jun 10, 2012 at 11:03 PM, John Berryman <[email protected]> wrote: > According to https://issues.apache.org/jira/browse/LUCENE-2605, the Lucene > QueryParser tokenizes on white space before giving any text to the Analyzer. > This makes it impossible to use multi-term synonyms because the > SynonymFilter only receives one word at a time. > > Resolution to this would really help with my current project. My project > client sells clothing and accessories online. They have plenty of examples > of compound words e.g."rain coat". But some of these compound words are > really tripping them up. A prime example is that a search for "dress shoes" > returns a list of dresses and random shoes (not necessarily dress shoes). I > wish that I was able to synonym compound words to single tokens (e.g. "dress > shoes => dress_shoes"), but with this whitespace tokenization issue, it's > impossible. > > Has anything happened with this bug recently? For a short time I've got a > client that would be willing to pay for this issues to be fixed if it's not > too much of a rabbit hole. Anyone care to catch me up with what this might > entail? > > -- > LinkedIn > Twitter > -- lucidimagination.com --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
