Thamme, Jeff,

your contributions will be very important for the project and the
community, especially given your NLP background, thanks for your support!

I agree moving towards NMT is the best thing to do at this point for Joshua.

Thamme, thanks for your suggestions!
I think we may now go into a research phase to understand what existing
toolkit we can more easily integrate with.
Of course if you like to integrate your own toolkit then that'd be even
more interesting to see how it compares to others.

If that means moving to Python I think it's not a problem; we can still
work on Java bindings to ship a new Joshua Decoder implementation.

The pretrained models topic is imho something we will have to embrace at
some point, so that others can:
a) just download new LPs
b) eventually fine tune with their own data

I am looking forward to start this new phase of research on Joshua.

Regards,
Tommaso

On Tue, 6 Oct 2020 at 18:30, Jeff Zemerick <jzemer...@apache.org> wrote:

> I haven't contributed to this point but I would like to see Apache Joshua
> remain an active project so I am volunteering to help. I may not be a lot
> of help with code for a bit but I will help out with documentation,
> releases, etc.
>
> I do agree that NMT is the best path forward but I will leave the choice of
> integrating an existing library into Joshua versus a new NMT implementation
> in Joshua to those more familiar with the code and what they think is best
> for the project.
>
> Jeff
>
>
> On Tue, Oct 6, 2020 at 2:28 AM Thamme Gowda <tgow...@gmail.com> wrote:
>
> > Hi Tomasso, and others
> >
> > *1.  I support the addition of neural MT decoder. *
> > The world has moved on, and NMT is clearly the way to go forward.
> > If you dont believe my words, read what Matt Post himself said [1] three
> > years ago!
> >
> > I have spent the past three years focusing on NMT  as part of my job and
> > Ph.D -- I'd be glad to contribute in that direction.
> > There are many NMT toolkits out there today. (Fairseq, sockeye,
> > tensor2tensor, ....)
> >
> > The right thing to do, IMHO, is simply merge one of the NMT toolkits with
> > Joshua project.  We can do that as long as it's Apache License right?
> > We will now have to move towards python land as most toolkits are in
> > python. On the positive side, we will be losing the ancient perl scripts
> > that many are not fan of.
> >
> > I have been working on my own NMT toolkit for my work and research --
> RTG
> > https://isi-nlp.github.io/rtg/#conf
> > I had worked on Joshua in the past, mainly, I improved the code quality
> > [2]. So you can tell my new code'd be upto Apache's standards ;)
> >
> > *2. Pretrained MT models for lots of languages*
> > I have been working on a lib to retrieve parallel data from many sources
> --
> > MTData [3]
> > There is so much parallel data out their for hundreds of languages.
> > My recent estimate is over a billion lines of parallel sentences for over
> > 500 languages is freely and publicly available for download using MTData
> > tool.
> > If we find some sponsors to lend us some resources, we could train better
> > MT models and update the Language Packs section [4].
> > Perhaps, one massively multilingual NMT model that supports many
> > translation directions (I know its possible with NMT; I tested it
> recently
> > with RTG)
> >
> > I am interested in hearing what others are thinking.
> >
> > [1]
> >
> >
> https://mail-archives.apache.org/mod_mbox/joshua-dev/201709.mbox/%3CA481E867-A845-4BC0-B5AF-5CEAAB3D0B7D%40cs.jhu.edu%3E
> > [2] - https://github.com/apache/joshua/pulls?q=author%3Athammegowda+
> > [3] - https://github.com/thammegowda/mtdata
> > [4] -  https://cwiki.apache.org/confluence/display/JOSHUA/Language+Packs
> >
> >
> > Cheers,
> > TG
> >
> > --
> > *Thamme Gowda *
> > @thammegowda <https://twitter.com/thammegowda> | https://isi.edu/~tg
> > ~Sent via somebody's Webmail server
> >
> >
> > ಸೋಮ, ಅಕ್ಟೋ 5, 2020 ರಂದು 12:16 ಪೂರ್ವಾಹ್ನ ಸಮಯಕ್ಕೆ ರಂದು Tommaso Teofili <
> > tommaso.teof...@gmail.com> ಅವರು ಬರೆದಿದ್ದಾರೆ:
> >
> > > Hi all,
> > >
> > > This is a roll call for people interested in contributing to Apache
> > Joshua
> > > going forward.
> > > Contributing could be not just code, but anything that may help the
> > project
> > > or serve the community.
> > >
> > > In case you're interested in helping out please speak up :-)
> > >
> > > Code-wise Joshua has not evolved much in the latest months, there's
> room
> > > for both improvements to the current code (make a new minor release)
> and
> > > new ideas / code branches (e.g. neural MT based Joshua Decoder).
> > >
> > > Regards,
> > > Tommaso
> > >
> >
>

Reply via email to