Hello,
I have a search project which uses the Lucene PatternAnalyzer for its
text/query analysis.
At the moment it's configured like so:
analyzer = new PatternAnalyzer(Version.LUCENE_35, Pattern.compile("\\s+"),
true, null);
My goal here was to split words based on spaces and make things case
insensitive.
In thinking about this however I probably want to be a little bit more
sophisticated. I'd like to ignore punctuation which occurs at the end or
beginning of a word.
Is this simply a matter of writing a regex which treats those cases the
same as a space?
Would I use something like this:
analyzer = new PatternAnalyzer(Version.LUCENE_35,
Pattern.compile("\\s+|\\p{Punct}+\\w|\\w\\p{Punct}"), true, null);
Thanks so much!
Dave
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]