Sentence recognition is usually an NLP problem. Probably best handled outside of Solr. For example, you probably want to train and run a sentence recognition algorithm, inject a sentence delimiter, then use that delimiter as the basis for tokenization.
More info on sentence recognition http://opennlp.apache.org/documentation/manual/opennlp.html On Wed, Sep 23, 2015 at 11:18 AM, Ziqi Zhang <ziqi.zh...@sheffield.ac.uk> wrote: > Hi > > I need a special kind of 'token' which is a sentence, so I need a > tokenizer that splits texts into sentences. > > I wonder if there is already such or similar implementations? > > If I have to implement it myself, I suppose I need to implement a subclass > of Tokenizer. Having looked at a few existing implementations, it does not > look very straightforward how to do it. A few pointers would be highly > appreciated. > > Many thanks > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- *Doug Turnbull **| *Search Relevance Consultant | OpenSource Connections <http://opensourceconnections.com>, LLC | 240.476.9983 Author: Relevant Search <http://manning.com/turnbull> This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.