Im this case its more fun not going the easy way 😉
Chris Collins <chris_j_coll...@yahoo.com.invalid> schrieb am Mo. 29. Mai 2017 um 21:41: > I am glad that basistech has tools to bring lemmings back :-} I am > guessing you also have lemmati[z|s]ation. > > > > On May 29, 2017, at 12:37 PM, Chris Brown <cbr...@basistech.com> wrote: > > > > If you used our products which have Elastic plugins,POS, Stems and > > Leminisation it would be much easier. > > > > > > > > > > Kind Regards > > > > > > > > Chris > > > > VP International > > > > E: cbr...@basistech.com > > > > T: +44 208 622 2900 > > > > M: +44 7796946934 > > > > USA Number: +16173867107 > > > > Lakeside House, 1 Furzeground Way, Stockley Park, Middlesex, UB11 1BD, UK > > > > > > On 29 May 2017 at 19:42, Christian Becker <christian.frei...@gmail.com> > > wrote: > > > >> I'm sorry - I didn't write down, that my intention is to have linguistic > >> annotations like stems and maybe part of speech information. For sure, > >> tokenization is one of the things I want to do. > >> > >> 2017-05-29 19:02 GMT+02:00 Robert Muir <rcm...@gmail.com>: > >> > >>> On Mon, May 29, 2017 at 8:36 AM, Christian Becker > >>> <christian.frei...@gmail.com> wrote: > >>>> Hi There, > >>>> > >>>> I'm new to lucene (in fact im interested in ElasticSearch but in this > >>> case > >>>> its related to lucene) and I want to make some experiments with some > >>>> enhanced analyzers. > >>>> > >>>> Indeed I have an external linguistic component which I want to connect > >> to > >>>> Lucene / EleasticSearch. So before I'm producing a bunch of useless > >>> code, I > >>>> want to make sure that I'm going the right way. > >>>> > >>>> The linguistic component needs at least a whole sentence as Input (at > >>> best > >>>> it would be the whole text at once). > >>>> > >>>> So as far as I can see I would need to create a custom Analyzer and > >>>> overrride "createComponents" and "normalize". > >>>> > >>> > >>> There is a base class for tokenizers that want to see > >>> sentences-at-a-time in order to divide into words: > >>> > >>> https://github.com/apache/lucene-solr/blob/master/ > >>> lucene/analysis/common/src/java/org/apache/lucene/analysis/util/ > >>> SegmentingTokenizerBase.java#L197-L201 > >>> > >>> There are two examples that use it in the test class: > >>> > >>> https://github.com/apache/lucene-solr/blob/master/ > >>> lucene/analysis/common/src/test/org/apache/lucene/analysis/util/ > >>> TestSegmentingTokenizerBase.java#L145 > >>> > >> > >