Re: Developing experimental "more advanced" analyzers

Christian Becker Mon, 29 May 2017 13:33:31 -0700

Im this case its more fun not going the easy way 😉


Chris Collins <chris_j_coll...@yahoo.com.invalid> schrieb am Mo. 29. Mai
2017 um 21:41:

> I am glad that basistech has tools to bring lemmings back :-}  I am
> guessing you also have lemmati[z|s]ation.
>
>
> > On May 29, 2017, at 12:37 PM, Chris Brown <cbr...@basistech.com> wrote:
> >
> > If you used our products which have Elastic plugins,POS, Stems and
> > Leminisation it would be much easier.
> >
> >
> >
> >
> > Kind Regards
> >
> >
> >
> > Chris
> >
> > VP International
> >
> > E: cbr...@basistech.com
> >
> > T: +44 208 622 2900
> >
> > M: +44 7796946934
> >
> > USA Number: +16173867107
> >
> > Lakeside House, 1 Furzeground Way, Stockley Park, Middlesex, UB11 1BD, UK
> >
> >
> > On 29 May 2017 at 19:42, Christian Becker <christian.frei...@gmail.com>
> > wrote:
> >
> >> I'm sorry - I didn't write down, that my intention is to have linguistic
> >> annotations like stems and maybe part of speech information. For sure,
> >> tokenization is one of the things I want to do.
> >>
> >> 2017-05-29 19:02 GMT+02:00 Robert Muir <rcm...@gmail.com>:
> >>
> >>> On Mon, May 29, 2017 at 8:36 AM, Christian Becker
> >>> <christian.frei...@gmail.com> wrote:
> >>>> Hi There,
> >>>>
> >>>> I'm new to lucene (in fact im interested in ElasticSearch but in this
> >>> case
> >>>> its related to lucene) and I want to make some experiments with some
> >>>> enhanced analyzers.
> >>>>
> >>>> Indeed I have an external linguistic component which I want to connect
> >> to
> >>>> Lucene / EleasticSearch. So before I'm producing a bunch of useless
> >>> code, I
> >>>> want to make sure that I'm going the right way.
> >>>>
> >>>> The linguistic component needs at least a whole sentence as Input (at
> >>> best
> >>>> it would be the whole text at once).
> >>>>
> >>>> So as far as I can see I would need to create a custom Analyzer and
> >>>> overrride "createComponents" and "normalize".
> >>>>
> >>>
> >>> There is a base class for tokenizers that want to see
> >>> sentences-at-a-time in order to divide into words:
> >>>
> >>> https://github.com/apache/lucene-solr/blob/master/
> >>> lucene/analysis/common/src/java/org/apache/lucene/analysis/util/
> >>> SegmentingTokenizerBase.java#L197-L201
> >>>
> >>> There are two examples that use it in the test class:
> >>>
> >>> https://github.com/apache/lucene-solr/blob/master/
> >>> lucene/analysis/common/src/test/org/apache/lucene/analysis/util/
> >>> TestSegmentingTokenizerBase.java#L145
> >>>
> >>
>
>

Re: Developing experimental "more advanced" analyzers

Reply via email to