thanks! I didn't see the support in 1.5.2-incubating. I'll build from trunk and try.
On Thu, May 31, 2012 at 7:05 AM, William Colen <[email protected]>wrote: > As far as I know you don't need a CLA for a patch. Simply open a Jira and > attach your patch to it. > > Besides what James pointed, you may also want change the EOS characters. > There are two related new features that are already implemented in the > trunk: > > https://issues.apache.org/jira/browse/OPENNLP-428 > This one added an optional command line argument where you set the > end-of-sentence characters. This setting will be persisted to the model. If > you are using the API you can create a SentenceDetectorFactory and use it > to set the EOS chars. > > https://issues.apache.org/jira/browse/OPENNLP-434 > This is a new feature that allow customizing the SentenceDetector. You can > extend the SentenceDetectorFactory and override methods as needed. You can > pass in the customized factory using both the command line or the API. > > > On Wed, May 30, 2012 at 7:19 PM, James Kosin <[email protected]> > wrote: > > > Hi Soubhik, > > > > Should already be supported. > > You have to pass the -encoding utf8 to the command line interface. > > > > James > > > > On 5/30/2012 1:52 PM, Soubhik (সৌভিক) wrote: > > > Hi, > > > > > > I'm trying to use OpenNLP to train a sentence detector for Bengali > > language > > > ("bn"). I would like to add support for Unicode danda character in > > > opennlp.tools.sentdetect.lang.Factory > > > class. this character is a sentence break in Bengali, Hindi and several > > > other Indian languages. the code change should be small (< 10 lines). > > > > > > Is it correct to think that a change of this size will not require a > CLA? > > > > > > Ref: en.wikipedia.org/wiki/*Danda* > > > > > > Regards, > > > Soubhik. > > > -- > > > > > > > > -- Soubhik Bhattacharya
