If you used our products which have Elastic plugins,POS, Stems and
Leminisation it would be much easier.




Kind Regards



Chris

VP International

E: cbr...@basistech.com

T: +44 208 622 2900

M: +44 7796946934

USA Number: +16173867107

Lakeside House, 1 Furzeground Way, Stockley Park, Middlesex, UB11 1BD, UK


On 29 May 2017 at 19:42, Christian Becker <christian.frei...@gmail.com>
wrote:

> I'm sorry - I didn't write down, that my intention is to have linguistic
> annotations like stems and maybe part of speech information. For sure,
> tokenization is one of the things I want to do.
>
> 2017-05-29 19:02 GMT+02:00 Robert Muir <rcm...@gmail.com>:
>
> > On Mon, May 29, 2017 at 8:36 AM, Christian Becker
> > <christian.frei...@gmail.com> wrote:
> > > Hi There,
> > >
> > > I'm new to lucene (in fact im interested in ElasticSearch but in this
> > case
> > > its related to lucene) and I want to make some experiments with some
> > > enhanced analyzers.
> > >
> > > Indeed I have an external linguistic component which I want to connect
> to
> > > Lucene / EleasticSearch. So before I'm producing a bunch of useless
> > code, I
> > > want to make sure that I'm going the right way.
> > >
> > > The linguistic component needs at least a whole sentence as Input (at
> > best
> > > it would be the whole text at once).
> > >
> > > So as far as I can see I would need to create a custom Analyzer and
> > > overrride "createComponents" and "normalize".
> > >
> >
> > There is a base class for tokenizers that want to see
> > sentences-at-a-time in order to divide into words:
> >
> > https://github.com/apache/lucene-solr/blob/master/
> > lucene/analysis/common/src/java/org/apache/lucene/analysis/util/
> > SegmentingTokenizerBase.java#L197-L201
> >
> > There are two examples that use it in the test class:
> >
> > https://github.com/apache/lucene-solr/blob/master/
> > lucene/analysis/common/src/test/org/apache/lucene/analysis/util/
> > TestSegmentingTokenizerBase.java#L145
> >
>

Reply via email to