Il giorno mer 21 dic 2016 alle ore 16:00 Matt Post <p...@cs.jhu.edu> ha scritto:
> Sure, that'd be nice to do. I'd love to get rid of the Perl scripts. Are > you just throwing out an idea or are you interested in doing this? I'd be happy to do it. If Joern can help out that'd be of course very appreciated. > I think the way to go would be to set this up on a branch (off 7), and > then I could test it on some languages. > sure, and hopefully branch 7 becomes our new master soon after the 6.1 release. Regards, Tommaso > > > > On Dec 21, 2016, at 5:33 AM, Tommaso Teofili <tommaso.teof...@gmail.com> > wrote: > > > > Hi all, > > > > I was talking to Joern (Apache OpenNLP committer) recently and it came up > > the idea that we could use OpenNLP for the data preprocessing phase in > > Joshua as to allow tokenization, sentence detection, etc. > > As I was reading through our doc [1] this is currently done with > dedicated > > scripts; we could make that part pluggable (with a default simple Java > > implementation) and allow more fine grained control over it using > libraries > > like OpenNLP: > > > > What would people think? > > > > Regards, > > Tommaso > > > > [1] : https://cwiki.apache.org/confluence/display/JOSHUA/Project+Ideas > >