I haven't contributed to this point but I would like to see Apache Joshua
remain an active project so I am volunteering to help. I may not be a lot
of help with code for a bit but I will help out with documentation,
releases, etc.

I do agree that NMT is the best path forward but I will leave the choice of
integrating an existing library into Joshua versus a new NMT implementation
in Joshua to those more familiar with the code and what they think is best
for the project.

Jeff


On Tue, Oct 6, 2020 at 2:28 AM Thamme Gowda <tgow...@gmail.com> wrote:

> Hi Tomasso, and others
>
> *1.  I support the addition of neural MT decoder. *
> The world has moved on, and NMT is clearly the way to go forward.
> If you dont believe my words, read what Matt Post himself said [1] three
> years ago!
>
> I have spent the past three years focusing on NMT  as part of my job and
> Ph.D -- I'd be glad to contribute in that direction.
> There are many NMT toolkits out there today. (Fairseq, sockeye,
> tensor2tensor, ....)
>
> The right thing to do, IMHO, is simply merge one of the NMT toolkits with
> Joshua project.  We can do that as long as it's Apache License right?
> We will now have to move towards python land as most toolkits are in
> python. On the positive side, we will be losing the ancient perl scripts
> that many are not fan of.
>
> I have been working on my own NMT toolkit for my work and research --  RTG
> https://isi-nlp.github.io/rtg/#conf
> I had worked on Joshua in the past, mainly, I improved the code quality
> [2]. So you can tell my new code'd be upto Apache's standards ;)
>
> *2. Pretrained MT models for lots of languages*
> I have been working on a lib to retrieve parallel data from many sources --
> MTData [3]
> There is so much parallel data out their for hundreds of languages.
> My recent estimate is over a billion lines of parallel sentences for over
> 500 languages is freely and publicly available for download using MTData
> tool.
> If we find some sponsors to lend us some resources, we could train better
> MT models and update the Language Packs section [4].
> Perhaps, one massively multilingual NMT model that supports many
> translation directions (I know its possible with NMT; I tested it recently
> with RTG)
>
> I am interested in hearing what others are thinking.
>
> [1]
>
> https://mail-archives.apache.org/mod_mbox/joshua-dev/201709.mbox/%3CA481E867-A845-4BC0-B5AF-5CEAAB3D0B7D%40cs.jhu.edu%3E
> [2] - https://github.com/apache/joshua/pulls?q=author%3Athammegowda+
> [3] - https://github.com/thammegowda/mtdata
> [4] -  https://cwiki.apache.org/confluence/display/JOSHUA/Language+Packs
>
>
> Cheers,
> TG
>
> --
> *Thamme Gowda *
> @thammegowda <https://twitter.com/thammegowda> | https://isi.edu/~tg
> ~Sent via somebody's Webmail server
>
>
> ಸೋಮ, ಅಕ್ಟೋ 5, 2020 ರಂದು 12:16 ಪೂರ್ವಾಹ್ನ ಸಮಯಕ್ಕೆ ರಂದು Tommaso Teofili <
> tommaso.teof...@gmail.com> ಅವರು ಬರೆದಿದ್ದಾರೆ:
>
> > Hi all,
> >
> > This is a roll call for people interested in contributing to Apache
> Joshua
> > going forward.
> > Contributing could be not just code, but anything that may help the
> project
> > or serve the community.
> >
> > In case you're interested in helping out please speak up :-)
> >
> > Code-wise Joshua has not evolved much in the latest months, there's room
> > for both improvements to the current code (make a new minor release) and
> > new ideas / code branches (e.g. neural MT based Joshua Decoder).
> >
> > Regards,
> > Tommaso
> >
>

Reply via email to