Thanks, I was able to use the module, however my Analyzer is not invoked upon the IndexWriter.addDocument(), even thought I pass it to constructor upon creating IndexWriterConfig and when I test the Analyzer, by calling it explicitly using the instructions in http://lucene.apache.org/core/7_1_0/core/org/apache/lucene/analysis/package-summary.html under Invoking the Analyzer, the Analyzer works as expected.
Do you know what could I be missing? Please let me know if you need any more of my code. Regards, Armīns On Mon, Jan 8, 2018 at 3:27 PM, Uwe Schindler <u...@thetaphi.de> wrote: > Hi > > It is part of the analyzers-common module, it is not included in Lucene's > core. Lucene's core module only has a single analyzer (StandardAnalyzer) > and some helper classes, but not the full set of multi-purpose and language > specific ones. > > Uwe > > ----- > Uwe Schindler > Achterdiek 19, D-28357 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > -----Original Message----- > > From: Armins Stepanjans [mailto:armins.bagr...@gmail.com] > > Sent: Monday, January 8, 2018 2:09 PM > > To: java-user@lucene.apache.org > > Subject: Re: Looking For Tokenizer With Custom Delimeter > > > > Thanks for the solution, however I am unable to access CharTokenizer > class, > > when I import it using: > > > > import org.apache.lucene.analysis.util.*; > > > > Although I am able to access classes directly under analysis (or > > analysis.standard) just fine with the import statement: > > import org.apache.lucene.analysis.*; > > > > Does this appear as a Lucene specific problem? > > > > P.S. I'm using Maven for managing my dependencies with the following two > > statements for Lucene: > > > > <dependency> > > <groupId>org.apache.lucene</groupId> > > <artifactId>lucene-core</artifactId> > > <version>7.1.0</version> > > </dependency> > > > > <dependency> > > <groupId>org.apache.lucene</groupId> > > <artifactId>lucene-queryparser</artifactId> > > <version>7.1.0</version> > > </dependency> > > > > Regards, > > Armīns > > > > On Mon, Jan 8, 2018 at 12:53 PM, Uwe Schindler <u...@thetaphi.de> wrote: > > > > > Moin, > > > > > > Plain easy to do customize with lambdas! E.g., an elegant way to > create a > > > tokenizer which behaves exactly as WhitespaceTokenizer and > > LowerCaseFilter > > > is: > > > > > > Tokenizer tok = > > CharTokenizer.fromSeparatorCharPredicate(Character::isWhitespace, > > > Character::toLowerCase); > > > > > > Adjust with Lambdas and you can create any tokenizer based on any > > > character check, so to check for whitespace or underscore: > > > > > > Tokenizer tok = CharTokenizer.fromSeparatorCharPredicate(ch -> > > > Character.isWhitespace || ch == '_'); > > > > > > Uwe > > > > > > ----- > > > Uwe Schindler > > > Achterdiek 19, D-28357 Bremen > > > http://www.thetaphi.de > > > eMail: u...@thetaphi.de > > > > > > > -----Original Message----- > > > > From: Armins Stepanjans [mailto:armins.bagr...@gmail.com] > > > > Sent: Monday, January 8, 2018 11:30 AM > > > > To: java-user@lucene.apache.org > > > > Subject: Looking For Tokenizer With Custom Delimeter > > > > > > > > Hi, > > > > > > > > I am looking for a tokenizer, where I could specify a delimiter by > which > > > > the words are tokenized, for example if I choose the delimiters as ' > ' > > > and > > > > '_' the following string: > > > > "foo__bar doo" > > > > would be tokenized into: > > > > "foo", "", "bar", "doo" > > > > (The analyzer could further filter empty tokens, since having the > empty > > > > string token is not critical). > > > > > > > > Is such functionality built into Lucene (I'm working with 7.1.0) and > does > > > > this seem like the correct approach to the problem? > > > > > > > > Regards, > > > > Armīns > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >