How's it different than Bicleaner or LASER? On Tue, Jan 19, 2021 at 4:09 PM <mt-list-requ...@eamt.org> wrote:
> Send Mt-list mailing list submissions to > mt-list@eamt.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://lists.eamt.org/mailman/listinfo/mt-list > or, via email, send a message with subject or body 'help' to > mt-list-requ...@eamt.org > > You can reach the person managing the list at > mt-list-ow...@eamt.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Mt-list digest..." > > > Today's Topics: > > 1. Re: CFP: WAT2021 (The 8th Workshop on Asian Translation) > (Adam Bittlingmayer) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Tue, 19 Jan 2021 12:00:26 +0400 > From: Adam Bittlingmayer <a...@modelfront.com> > To: Toshiaki Nakazawa <nakaz...@logos.t.u-tokyo.ac.jp> > Cc: mt-list@eamt.org > Subject: Re: [Mt-list] CFP: WAT2021 (The 8th Workshop on Asian > Translation) > Message-ID: > < > calson-dwazwpk-v+znqmee4qqjyyzpmukf63h+afcw5dtyb...@mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > How well is it working for low-resource langs? > > > > We try to support all language pairs. I've tried Inuktitut-English and > Hindi-Marathi, for example. > > The main factors are: > > 1. How dirty your parallel corpus is > In that sense, low-resource languages are often easier. The relative > ranking just needs to be working. > > 3. How much data we have for the language > My own language (Alemannic) is *not* working well. It's not in Mozilla > TMs, BERT or LASER, and has no standard orthography. But a language like > Armenian, with a smaller number of speakers and lower GDP, is working > better, because their Wikipedia is top, and their unique script makes it > easy to identify. In this conference, I expect Oriya/Odia and Khmer will > be the toughest. > > 2. How much data we have for the *pair* > We have seen Hindi-Marathi and Russian-Armenian working decently, but they > are well-established pairs with a lot of cultural overlap (Sprachbund). > > 3. Your use case > Training from scratch for a generic system on very large datasets is > different than fine-tuning for a domain on small data. (For the former, > you usually want strict 1:1ness, e.g. miles should not convert to > kilometres.) It won't work well out of the box if you're doing adversarial > attacks or need it calibrated across language pairs. > > 4. If the low-resource language is the source or the target language > Just imagine a human doing this, who only knows one of the languages. > > There is an unknown language option (*other UND*) so you can even try it on > languages not in the dropdown. That works better if it's the source > language, not the target language. > > If you see issues or have data that can improve a language pair, let me > know. > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://lists.eamt.org/mailman/private/mt-list/attachments/20210119/f9a2c03d/attachment-0001.html > > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > Mt-list mailing list > Mt-list@eamt.org > http://lists.eamt.org/mailman/listinfo/mt-list > > > ------------------------------ > > End of Mt-list Digest, Vol 88, Issue 16 > *************************************** > -- Best regards, Nerses Nersesyan
_______________________________________________ Mt-list site list Mt-list@eamt.org http://lists.eamt.org/mailman/listinfo/mt-list