I am hoping someone could point me in the right direction - while I have a working solution I do not feel it is the best/correct solution to the problem I was trying to solve.
My project is using Lucene to perform matching between two data sets. Where one may have the text "Red Green" and the other would use "redgreen". What I have done is create a Token Pair Concatenating Filter: https://github.com/jeremylong/DependencyCheck/blob/master/core/src/main/java/org/owasp/dependencycheck/data/lucene/TokenPairConcatenatingFilter.java. Where the query "field:(red blue green)" would end up being parsed to "+field:red +field:redblue +field:blue +field:bluegreen +field:green". However, my implementation ends up adding superfluous parenthesis to the parsed query and I'm fairly certain I've missed a few key points with how to implement a token filter that injects additional tokens into the stream. I would be most appreciative if someone could take a look at the implementation and suggest any improvements or point me to any documentation that could help me better understand how a TokenFilter can inject additional tokens into the stream. Thanks in advance, Jeremy