Re: tokenize into sentences/sentence splitter

Doug Turnbull Wed, 23 Sep 2015 08:25:19 -0700

Sentence recognition is usually an NLP problem. Probably best handled
outside of Solr. For example, you probably want to train and run a sentence
recognition algorithm, inject a sentence delimiter, then use that delimiter
as the basis for tokenization.


More info on sentence recognition
http://opennlp.apache.org/documentation/manual/opennlp.html

On Wed, Sep 23, 2015 at 11:18 AM, Ziqi Zhang <ziqi.zh...@sheffield.ac.uk>
wrote:

> Hi
>
> I need a special kind of 'token' which is a sentence, so I need a
> tokenizer that splits texts into sentences.
>
> I wonder if there is already such or similar implementations?
>
> If I have to implement it myself, I suppose I need to implement a subclass
> of Tokenizer. Having looked at a few existing implementations, it does not
> look very straightforward how to do it. A few pointers would be highly
> appreciated.
>
> Many thanks
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


-- 
*Doug Turnbull **| *Search Relevance Consultant | OpenSource Connections
<http://opensourceconnections.com>, LLC | 240.476.9983
Author: Relevant Search <http://manning.com/turnbull>
This e-mail and all contents, including attachments, is considered to be
Company Confidential unless explicitly stated otherwise, regardless
of whether attachments are marked as such.

Re: tokenize into sentences/sentence splitter

Reply via email to