Re: Developing experimental "more advanced" analyzers

Chris Collins Mon, 29 May 2017 12:41:50 -0700

I am glad that basistech has tools to bring lemmings back :-}  I am guessing 
you also have lemmati[z|s]ation.



> On May 29, 2017, at 12:37 PM, Chris Brown <cbr...@basistech.com> wrote:
> 
> If you used our products which have Elastic plugins,POS, Stems and
> Leminisation it would be much easier.
> 
> 
> 
> 
> Kind Regards
> 
> 
> 
> Chris
> 
> VP International
> 
> E: cbr...@basistech.com
> 
> T: +44 208 622 2900
> 
> M: +44 7796946934
> 
> USA Number: +16173867107
> 
> Lakeside House, 1 Furzeground Way, Stockley Park, Middlesex, UB11 1BD, UK
> 
> 
> On 29 May 2017 at 19:42, Christian Becker <christian.frei...@gmail.com>
> wrote:
> 
>> I'm sorry - I didn't write down, that my intention is to have linguistic
>> annotations like stems and maybe part of speech information. For sure,
>> tokenization is one of the things I want to do.
>> 
>> 2017-05-29 19:02 GMT+02:00 Robert Muir <rcm...@gmail.com>:
>> 
>>> On Mon, May 29, 2017 at 8:36 AM, Christian Becker
>>> <christian.frei...@gmail.com> wrote:
>>>> Hi There,
>>>> 
>>>> I'm new to lucene (in fact im interested in ElasticSearch but in this
>>> case
>>>> its related to lucene) and I want to make some experiments with some
>>>> enhanced analyzers.
>>>> 
>>>> Indeed I have an external linguistic component which I want to connect
>> to
>>>> Lucene / EleasticSearch. So before I'm producing a bunch of useless
>>> code, I
>>>> want to make sure that I'm going the right way.
>>>> 
>>>> The linguistic component needs at least a whole sentence as Input (at
>>> best
>>>> it would be the whole text at once).
>>>> 
>>>> So as far as I can see I would need to create a custom Analyzer and
>>>> overrride "createComponents" and "normalize".
>>>> 
>>> 
>>> There is a base class for tokenizers that want to see
>>> sentences-at-a-time in order to divide into words:
>>> 
>>> https://github.com/apache/lucene-solr/blob/master/
>>> lucene/analysis/common/src/java/org/apache/lucene/analysis/util/
>>> SegmentingTokenizerBase.java#L197-L201
>>> 
>>> There are two examples that use it in the test class:
>>> 
>>> https://github.com/apache/lucene-solr/blob/master/
>>> lucene/analysis/common/src/test/org/apache/lucene/analysis/util/
>>> TestSegmentingTokenizerBase.java#L145
>>> 
>>

Re: Developing experimental "more advanced" analyzers

Reply via email to