Re: Looking For Tokenizer With Custom Delimeter

Armins Stepanjans Mon, 08 Jan 2018 07:53:34 -0800

Thanks, I was able to use the module, however my Analyzer is not invoked
upon the IndexWriter.addDocument(), even thought I pass it to constructor
upon creating IndexWriterConfig and when I test the Analyzer, by calling it
explicitly using the instructions in
http://lucene.apache.org/core/7_1_0/core/org/apache/lucene/analysis/package-summary.html
under Invoking the Analyzer, the Analyzer works as expected.


Do you know what could I be missing?
Please let me know if you need any more of my code.

Regards,
Armīns

On Mon, Jan 8, 2018 at 3:27 PM, Uwe Schindler <[email protected]> wrote:

> Hi
>
> It is part of the analyzers-common module, it is not included in Lucene's
> core. Lucene's core module only has a single analyzer (StandardAnalyzer)
> and some helper classes, but not the full set of multi-purpose and language
> specific ones.
>
> Uwe
>
> -----
> Uwe Schindler
> Achterdiek 19, D-28357 Bremen
> http://www.thetaphi.de
> eMail: [email protected]
>
> > -----Original Message-----
> > From: Armins Stepanjans [mailto:[email protected]]
> > Sent: Monday, January 8, 2018 2:09 PM
> > To: [email protected]
> > Subject: Re: Looking For Tokenizer With Custom Delimeter
> >
> > Thanks for the solution, however I am unable to access CharTokenizer
> class,
> > when I import it using:
> >
> > import org.apache.lucene.analysis.util.*;
> >
> > Although I am able to access classes directly under analysis (or
> > analysis.standard) just fine with the import statement:
> > import org.apache.lucene.analysis.*;
> >
> > Does this appear as a Lucene specific problem?
> >
> > P.S. I'm using Maven for managing my dependencies with the following two
> > statements for Lucene:
> >
> >         <dependency>
> >             <groupId>org.apache.lucene</groupId>
> >             <artifactId>lucene-core</artifactId>
> >             <version>7.1.0</version>
> >         </dependency>
> >
> >         <dependency>
> >             <groupId>org.apache.lucene</groupId>
> >             <artifactId>lucene-queryparser</artifactId>
> >             <version>7.1.0</version>
> >         </dependency>
> >
> > Regards,
> > Armīns
> >
> > On Mon, Jan 8, 2018 at 12:53 PM, Uwe Schindler <[email protected]> wrote:
> >
> > > Moin,
> > >
> > > Plain easy to do customize with lambdas! E.g., an elegant way to
> create a
> > > tokenizer which behaves exactly as WhitespaceTokenizer and
> > LowerCaseFilter
> > > is:
> > >
> > > Tokenizer tok =
> > CharTokenizer.fromSeparatorCharPredicate(Character::isWhitespace,
> > > Character::toLowerCase);
> > >
> > > Adjust with Lambdas and you can create any tokenizer based on any
> > > character check, so to check for whitespace or underscore:
> > >
> > > Tokenizer tok = CharTokenizer.fromSeparatorCharPredicate(ch ->
> > > Character.isWhitespace || ch == '_');
> > >
> > > Uwe
> > >
> > > -----
> > > Uwe Schindler
> > > Achterdiek 19, D-28357 Bremen
> > > http://www.thetaphi.de
> > > eMail: [email protected]
> > >
> > > > -----Original Message-----
> > > > From: Armins Stepanjans [mailto:[email protected]]
> > > > Sent: Monday, January 8, 2018 11:30 AM
> > > > To: [email protected]
> > > > Subject: Looking For Tokenizer With Custom Delimeter
> > > >
> > > > Hi,
> > > >
> > > > I am looking for a tokenizer, where I could specify a delimiter by
> which
> > > > the words are tokenized, for example if I choose the delimiters as '
> '
> > > and
> > > > '_' the following string:
> > > > "foo__bar doo"
> > > > would be tokenized into:
> > > > "foo", "", "bar", "doo"
> > > > (The analyzer could further filter empty tokens, since having the
> empty
> > > > string token is not critical).
> > > >
> > > > Is such functionality built into Lucene (I'm working with 7.1.0) and
> does
> > > > this seem like the correct approach to the problem?
> > > >
> > > > Regards,
> > > > Armīns
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [email protected]
> > > For additional commands, e-mail: [email protected]
> > >
> > >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: Looking For Tokenizer With Custom Delimeter

Reply via email to