Tomasso and others,

> I think we may now go into a research phase to understand what existing
toolkit we can more easily integrate with.
Agreed.
if we can write a (short) report that compares various NMT toolkits of
2020, it would be useful for us to make this decision as well as to the NMT
community.
Something like a survey paper on NMT research but focus on toolkits and
software engineering part.



ಶುಕ್ರ, ಅಕ್ಟೋ 9, 2020 ರಂದು 11:39 ಅಪರಾಹ್ನ ಸಮಯಕ್ಕೆ ರಂದು Tommaso Teofili <
tommaso.teof...@gmail.com> ಅವರು ಬರೆದಿದ್ದಾರೆ:

> Thamme, Jeff,
>
> your contributions will be very important for the project and the
> community, especially given your NLP background, thanks for your support!
>
> I agree moving towards NMT is the best thing to do at this point for
> Joshua.
>
> Thamme, thanks for your suggestions!
> I think we may now go into a research phase to understand what existing
> toolkit we can more easily integrate with.
> Of course if you like to integrate your own toolkit then that'd be even
> more interesting to see how it compares to others.
>
> If that means moving to Python I think it's not a problem; we can still
> work on Java bindings to ship a new Joshua Decoder implementation.
>
> The pretrained models topic is imho something we will have to embrace at
> some point, so that others can:
> a) just download new LPs
> b) eventually fine tune with their own data
>
> I am looking forward to start this new phase of research on Joshua.
>
> Regards,
> Tommaso
>
> On Tue, 6 Oct 2020 at 18:30, Jeff Zemerick <jzemer...@apache.org> wrote:
>
> > I haven't contributed to this point but I would like to see Apache Joshua
> > remain an active project so I am volunteering to help. I may not be a lot
> > of help with code for a bit but I will help out with documentation,
> > releases, etc.
> >
> > I do agree that NMT is the best path forward but I will leave the choice
> of
> > integrating an existing library into Joshua versus a new NMT
> implementation
> > in Joshua to those more familiar with the code and what they think is
> best
> > for the project.
> >
> > Jeff
> >
> >
> > On Tue, Oct 6, 2020 at 2:28 AM Thamme Gowda <tgow...@gmail.com> wrote:
> >
> > > Hi Tomasso, and others
> > >
> > > *1.  I support the addition of neural MT decoder. *
> > > The world has moved on, and NMT is clearly the way to go forward.
> > > If you dont believe my words, read what Matt Post himself said [1]
> three
> > > years ago!
> > >
> > > I have spent the past three years focusing on NMT  as part of my job
> and
> > > Ph.D -- I'd be glad to contribute in that direction.
> > > There are many NMT toolkits out there today. (Fairseq, sockeye,
> > > tensor2tensor, ....)
> > >
> > > The right thing to do, IMHO, is simply merge one of the NMT toolkits
> with
> > > Joshua project.  We can do that as long as it's Apache License right?
> > > We will now have to move towards python land as most toolkits are in
> > > python. On the positive side, we will be losing the ancient perl
> scripts
> > > that many are not fan of.
> > >
> > > I have been working on my own NMT toolkit for my work and research --
> > RTG
> > > https://isi-nlp.github.io/rtg/#conf
> > > I had worked on Joshua in the past, mainly, I improved the code quality
> > > [2]. So you can tell my new code'd be upto Apache's standards ;)
> > >
> > > *2. Pretrained MT models for lots of languages*
> > > I have been working on a lib to retrieve parallel data from many
> sources
> > --
> > > MTData [3]
> > > There is so much parallel data out their for hundreds of languages.
> > > My recent estimate is over a billion lines of parallel sentences for
> over
> > > 500 languages is freely and publicly available for download using
> MTData
> > > tool.
> > > If we find some sponsors to lend us some resources, we could train
> better
> > > MT models and update the Language Packs section [4].
> > > Perhaps, one massively multilingual NMT model that supports many
> > > translation directions (I know its possible with NMT; I tested it
> > recently
> > > with RTG)
> > >
> > > I am interested in hearing what others are thinking.
> > >
> > > [1]
> > >
> > >
> >
> https://mail-archives.apache.org/mod_mbox/joshua-dev/201709.mbox/%3CA481E867-A845-4BC0-B5AF-5CEAAB3D0B7D%40cs.jhu.edu%3E
> > > [2] - https://github.com/apache/joshua/pulls?q=author%3Athammegowda+
> > > [3] - https://github.com/thammegowda/mtdata
> > > [4] -
> https://cwiki.apache.org/confluence/display/JOSHUA/Language+Packs
> > >
> > >
> > > Cheers,
> > > TG
> > >
> > > --
> > > *Thamme Gowda *
> > > @thammegowda <https://twitter.com/thammegowda> | https://isi.edu/~tg
> > > ~Sent via somebody's Webmail server
> > >
> > >
> > > ಸೋಮ, ಅಕ್ಟೋ 5, 2020 ರಂದು 12:16 ಪೂರ್ವಾಹ್ನ ಸಮಯಕ್ಕೆ ರಂದು Tommaso Teofili <
> > > tommaso.teof...@gmail.com> ಅವರು ಬರೆದಿದ್ದಾರೆ:
> > >
> > > > Hi all,
> > > >
> > > > This is a roll call for people interested in contributing to Apache
> > > Joshua
> > > > going forward.
> > > > Contributing could be not just code, but anything that may help the
> > > project
> > > > or serve the community.
> > > >
> > > > In case you're interested in helping out please speak up :-)
> > > >
> > > > Code-wise Joshua has not evolved much in the latest months, there's
> > room
> > > > for both improvements to the current code (make a new minor release)
> > and
> > > > new ideas / code branches (e.g. neural MT based Joshua Decoder).
> > > >
> > > > Regards,
> > > > Tommaso
> > > >
> > >
> >
>

Reply via email to