Following up on the report topic, I've created an overleaf doc for everyone
who's interested in working on this [1].

First set of (AL-2 compatible) NMT toolkits I've found:
- Joey NMT [2]
- OpenNMT [3]
- MarianNMT [4]
- Sockeye [5]
- and of course RTG already shared by Thamme [6]

Regards,
Tommaso

[1] : https://www.overleaf.com/8617554857qkvtqtpcxxmw
[2] : https://github.com/joeynmt/joeynmt
[3] : https://github.com/OpenNMT
[4] : https://github.com/marian-nmt/marian
[5] : https://github.com/awslabs/sockeye
[6] : https://github.com/isi-nlp/rtg-xt

On Wed, 14 Oct 2020 at 11:06, Tommaso Teofili <tommaso.teof...@gmail.com>
wrote:

> very good idea Thamme!
> I'd be up for writing such a short survey paper as a result of our
> analysis.
>
> Regards,
> Tommaso
>
>
> On Wed, 14 Oct 2020 at 05:23, Thamme Gowda <tgow...@gmail.com> wrote:
>
>> Tomasso and others,
>>
>> > I think we may now go into a research phase to understand what existing
>> toolkit we can more easily integrate with.
>> Agreed.
>> if we can write a (short) report that compares various NMT toolkits of
>> 2020, it would be useful for us to make this decision as well as to the
>> NMT
>> community.
>> Something like a survey paper on NMT research but focus on toolkits and
>> software engineering part.
>>
>>
>>
>> ಶುಕ್ರ, ಅಕ್ಟೋ 9, 2020 ರಂದು 11:39 ಅಪರಾಹ್ನ ಸಮಯಕ್ಕೆ ರಂದು Tommaso Teofili <
>> tommaso.teof...@gmail.com> ಅವರು ಬರೆದಿದ್ದಾರೆ:
>>
>> > Thamme, Jeff,
>> >
>> > your contributions will be very important for the project and the
>> > community, especially given your NLP background, thanks for your
>> support!
>> >
>> > I agree moving towards NMT is the best thing to do at this point for
>> > Joshua.
>> >
>> > Thamme, thanks for your suggestions!
>> > I think we may now go into a research phase to understand what existing
>> > toolkit we can more easily integrate with.
>> > Of course if you like to integrate your own toolkit then that'd be even
>> > more interesting to see how it compares to others.
>> >
>> > If that means moving to Python I think it's not a problem; we can still
>> > work on Java bindings to ship a new Joshua Decoder implementation.
>> >
>> > The pretrained models topic is imho something we will have to embrace at
>> > some point, so that others can:
>> > a) just download new LPs
>> > b) eventually fine tune with their own data
>> >
>> > I am looking forward to start this new phase of research on Joshua.
>> >
>> > Regards,
>> > Tommaso
>> >
>> > On Tue, 6 Oct 2020 at 18:30, Jeff Zemerick <jzemer...@apache.org>
>> wrote:
>> >
>> > > I haven't contributed to this point but I would like to see Apache
>> Joshua
>> > > remain an active project so I am volunteering to help. I may not be a
>> lot
>> > > of help with code for a bit but I will help out with documentation,
>> > > releases, etc.
>> > >
>> > > I do agree that NMT is the best path forward but I will leave the
>> choice
>> > of
>> > > integrating an existing library into Joshua versus a new NMT
>> > implementation
>> > > in Joshua to those more familiar with the code and what they think is
>> > best
>> > > for the project.
>> > >
>> > > Jeff
>> > >
>> > >
>> > > On Tue, Oct 6, 2020 at 2:28 AM Thamme Gowda <tgow...@gmail.com>
>> wrote:
>> > >
>> > > > Hi Tomasso, and others
>> > > >
>> > > > *1.  I support the addition of neural MT decoder. *
>> > > > The world has moved on, and NMT is clearly the way to go forward.
>> > > > If you dont believe my words, read what Matt Post himself said [1]
>> > three
>> > > > years ago!
>> > > >
>> > > > I have spent the past three years focusing on NMT  as part of my job
>> > and
>> > > > Ph.D -- I'd be glad to contribute in that direction.
>> > > > There are many NMT toolkits out there today. (Fairseq, sockeye,
>> > > > tensor2tensor, ....)
>> > > >
>> > > > The right thing to do, IMHO, is simply merge one of the NMT toolkits
>> > with
>> > > > Joshua project.  We can do that as long as it's Apache License
>> right?
>> > > > We will now have to move towards python land as most toolkits are in
>> > > > python. On the positive side, we will be losing the ancient perl
>> > scripts
>> > > > that many are not fan of.
>> > > >
>> > > > I have been working on my own NMT toolkit for my work and research
>> --
>> > > RTG
>> > > > https://isi-nlp.github.io/rtg/#conf
>> > > > I had worked on Joshua in the past, mainly, I improved the code
>> quality
>> > > > [2]. So you can tell my new code'd be upto Apache's standards ;)
>> > > >
>> > > > *2. Pretrained MT models for lots of languages*
>> > > > I have been working on a lib to retrieve parallel data from many
>> > sources
>> > > --
>> > > > MTData [3]
>> > > > There is so much parallel data out their for hundreds of languages.
>> > > > My recent estimate is over a billion lines of parallel sentences for
>> > over
>> > > > 500 languages is freely and publicly available for download using
>> > MTData
>> > > > tool.
>> > > > If we find some sponsors to lend us some resources, we could train
>> > better
>> > > > MT models and update the Language Packs section [4].
>> > > > Perhaps, one massively multilingual NMT model that supports many
>> > > > translation directions (I know its possible with NMT; I tested it
>> > > recently
>> > > > with RTG)
>> > > >
>> > > > I am interested in hearing what others are thinking.
>> > > >
>> > > > [1]
>> > > >
>> > > >
>> > >
>> >
>> https://mail-archives.apache.org/mod_mbox/joshua-dev/201709.mbox/%3CA481E867-A845-4BC0-B5AF-5CEAAB3D0B7D%40cs.jhu.edu%3E
>> > > > [2] -
>> https://github.com/apache/joshua/pulls?q=author%3Athammegowda+
>> > > > [3] - https://github.com/thammegowda/mtdata
>> > > > [4] -
>> > https://cwiki.apache.org/confluence/display/JOSHUA/Language+Packs
>> > > >
>> > > >
>> > > > Cheers,
>> > > > TG
>> > > >
>> > > > --
>> > > > *Thamme Gowda *
>> > > > @thammegowda <https://twitter.com/thammegowda> |
>> https://isi.edu/~tg
>> > > > ~Sent via somebody's Webmail server
>> > > >
>> > > >
>> > > > ಸೋಮ, ಅಕ್ಟೋ 5, 2020 ರಂದು 12:16 ಪೂರ್ವಾಹ್ನ ಸಮಯಕ್ಕೆ ರಂದು Tommaso
>> Teofili <
>> > > > tommaso.teof...@gmail.com> ಅವರು ಬರೆದಿದ್ದಾರೆ:
>> > > >
>> > > > > Hi all,
>> > > > >
>> > > > > This is a roll call for people interested in contributing to
>> Apache
>> > > > Joshua
>> > > > > going forward.
>> > > > > Contributing could be not just code, but anything that may help
>> the
>> > > > project
>> > > > > or serve the community.
>> > > > >
>> > > > > In case you're interested in helping out please speak up :-)
>> > > > >
>> > > > > Code-wise Joshua has not evolved much in the latest months,
>> there's
>> > > room
>> > > > > for both improvements to the current code (make a new minor
>> release)
>> > > and
>> > > > > new ideas / code branches (e.g. neural MT based Joshua Decoder).
>> > > > >
>> > > > > Regards,
>> > > > > Tommaso
>> > > > >
>> > > >
>> > >
>> >
>>
>

Reply via email to