I am glad that basistech has tools to bring lemmings back :-} I am guessing you also have lemmati[z|s]ation.
> On May 29, 2017, at 12:37 PM, Chris Brown <cbr...@basistech.com> wrote: > > If you used our products which have Elastic plugins,POS, Stems and > Leminisation it would be much easier. > > > > > Kind Regards > > > > Chris > > VP International > > E: cbr...@basistech.com > > T: +44 208 622 2900 > > M: +44 7796946934 > > USA Number: +16173867107 > > Lakeside House, 1 Furzeground Way, Stockley Park, Middlesex, UB11 1BD, UK > > > On 29 May 2017 at 19:42, Christian Becker <christian.frei...@gmail.com> > wrote: > >> I'm sorry - I didn't write down, that my intention is to have linguistic >> annotations like stems and maybe part of speech information. For sure, >> tokenization is one of the things I want to do. >> >> 2017-05-29 19:02 GMT+02:00 Robert Muir <rcm...@gmail.com>: >> >>> On Mon, May 29, 2017 at 8:36 AM, Christian Becker >>> <christian.frei...@gmail.com> wrote: >>>> Hi There, >>>> >>>> I'm new to lucene (in fact im interested in ElasticSearch but in this >>> case >>>> its related to lucene) and I want to make some experiments with some >>>> enhanced analyzers. >>>> >>>> Indeed I have an external linguistic component which I want to connect >> to >>>> Lucene / EleasticSearch. So before I'm producing a bunch of useless >>> code, I >>>> want to make sure that I'm going the right way. >>>> >>>> The linguistic component needs at least a whole sentence as Input (at >>> best >>>> it would be the whole text at once). >>>> >>>> So as far as I can see I would need to create a custom Analyzer and >>>> overrride "createComponents" and "normalize". >>>> >>> >>> There is a base class for tokenizers that want to see >>> sentences-at-a-time in order to divide into words: >>> >>> https://github.com/apache/lucene-solr/blob/master/ >>> lucene/analysis/common/src/java/org/apache/lucene/analysis/util/ >>> SegmentingTokenizerBase.java#L197-L201 >>> >>> There are two examples that use it in the test class: >>> >>> https://github.com/apache/lucene-solr/blob/master/ >>> lucene/analysis/common/src/test/org/apache/lucene/analysis/util/ >>> TestSegmentingTokenizerBase.java#L145 >>> >>