Thanks for the solution, however I am unable to access CharTokenizer class,
when I import it using:
import org.apache.lucene.analysis.util.*;
Although I am able to access classes directly under analysis (or
analysis.standard) just fine with the import statement:
import org.apache.lucene.analysis.*;
Does this appear as a Lucene specific problem?
P.S. I'm using Maven for managing my dependencies with the following two
statements for Lucene:
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-core</artifactId>
<version>7.1.0</version>
</dependency>
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-queryparser</artifactId>
<version>7.1.0</version>
</dependency>
Regards,
Armīns
On Mon, Jan 8, 2018 at 12:53 PM, Uwe Schindler <[email protected]> wrote:
> Moin,
>
> Plain easy to do customize with lambdas! E.g., an elegant way to create a
> tokenizer which behaves exactly as WhitespaceTokenizer and LowerCaseFilter
> is:
>
> Tokenizer tok =
> CharTokenizer.fromSeparatorCharPredicate(Character::isWhitespace,
> Character::toLowerCase);
>
> Adjust with Lambdas and you can create any tokenizer based on any
> character check, so to check for whitespace or underscore:
>
> Tokenizer tok = CharTokenizer.fromSeparatorCharPredicate(ch ->
> Character.isWhitespace || ch == '_');
>
> Uwe
>
> -----
> Uwe Schindler
> Achterdiek 19, D-28357 Bremen
> http://www.thetaphi.de
> eMail: [email protected]
>
> > -----Original Message-----
> > From: Armins Stepanjans [mailto:[email protected]]
> > Sent: Monday, January 8, 2018 11:30 AM
> > To: [email protected]
> > Subject: Looking For Tokenizer With Custom Delimeter
> >
> > Hi,
> >
> > I am looking for a tokenizer, where I could specify a delimiter by which
> > the words are tokenized, for example if I choose the delimiters as ' '
> and
> > '_' the following string:
> > "foo__bar doo"
> > would be tokenized into:
> > "foo", "", "bar", "doo"
> > (The analyzer could further filter empty tokens, since having the empty
> > string token is not critical).
> >
> > Is such functionality built into Lucene (I'm working with 7.1.0) and does
> > this seem like the correct approach to the problem?
> >
> > Regards,
> > Armīns
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>