Re: Unicode danda in sentence detector

সৌভিক Wed, 30 May 2012 23:27:28 -0700

thanks!

I didn't see the support in 1.5.2-incubating. I'll build from trunk and try.


On Thu, May 31, 2012 at 7:05 AM, William Colen <[email protected]>wrote:

> As far as I know you don't need a CLA for a patch. Simply open a Jira and
> attach your patch to it.
>
> Besides what James pointed, you may also want change the EOS characters.
> There are two related new features that are already implemented in the
> trunk:
>
> https://issues.apache.org/jira/browse/OPENNLP-428
> This one added an optional command line argument where you set the
> end-of-sentence characters. This setting will be persisted to the model. If
> you are using the API you can create a SentenceDetectorFactory and use it
> to set the EOS chars.
>
> https://issues.apache.org/jira/browse/OPENNLP-434
> This is a new feature that allow customizing the SentenceDetector. You can
> extend the SentenceDetectorFactory and override methods as needed. You can
> pass in the customized factory using both the command line or the API.
>
>
> On Wed, May 30, 2012 at 7:19 PM, James Kosin <[email protected]>
> wrote:
>
> > Hi Soubhik,
> >
> > Should already be supported.
> > You have to pass the -encoding utf8 to the command line interface.
> >
> > James
> >
> > On 5/30/2012 1:52 PM, Soubhik (সৌভিক) wrote:
> > > Hi,
> > >
> > > I'm trying to use OpenNLP to train a sentence detector for Bengali
> > language
> > > ("bn"). I would like to add support for Unicode danda character in
> > > opennlp.tools.sentdetect.lang.Factory
> > > class. this character is a sentence break in Bengali, Hindi and several
> > > other Indian languages. the code change should be small (< 10 lines).
> > >
> > > Is it correct to think that a change of this size will not require a
> CLA?
> > >
> > > Ref: en.wikipedia.org/wiki/*Danda*
> > >
> > > Regards,
> > > Soubhik.
> > > --
> > >
> >
> >
>



-- 
Soubhik Bhattacharya

Re: Unicode danda in sentence detector

Reply via email to