How's it different than Bicleaner or LASER?

On Tue, Jan 19, 2021 at 4:09 PM <mt-list-requ...@eamt.org> wrote:

> Send Mt-list mailing list submissions to
>         mt-list@eamt.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         http://lists.eamt.org/mailman/listinfo/mt-list
> or, via email, send a message with subject or body 'help' to
>         mt-list-requ...@eamt.org
>
> You can reach the person managing the list at
>         mt-list-ow...@eamt.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Mt-list digest..."
>
>
> Today's Topics:
>
>    1. Re: CFP: WAT2021 (The 8th Workshop on Asian Translation)
>       (Adam Bittlingmayer)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Tue, 19 Jan 2021 12:00:26 +0400
> From: Adam Bittlingmayer <a...@modelfront.com>
> To: Toshiaki Nakazawa <nakaz...@logos.t.u-tokyo.ac.jp>
> Cc: mt-list@eamt.org
> Subject: Re: [Mt-list] CFP: WAT2021 (The 8th Workshop on Asian
>         Translation)
> Message-ID:
>         <
> calson-dwazwpk-v+znqmee4qqjyyzpmukf63h+afcw5dtyb...@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> How well is it working for low-resource langs?
> >
>
> We try to support all language pairs.  I've tried Inuktitut-English and
> Hindi-Marathi, for example.
>
> The main factors are:
>
> 1. How dirty your parallel corpus is
> In that sense, low-resource languages are often easier.  The relative
> ranking just needs to be working.
>
> 3. How much data we have for the language
> My own language (Alemannic) is *not* working well.  It's not in Mozilla
> TMs, BERT or LASER, and has no standard orthography.  But a language like
> Armenian, with a smaller number of speakers and lower GDP, is working
> better, because their Wikipedia is top, and their unique script makes it
> easy to identify.  In this conference, I expect Oriya/Odia and Khmer will
> be the toughest.
>
> 2. How much data we have for the *pair*
> We have seen Hindi-Marathi and Russian-Armenian working decently, but they
> are well-established pairs with a lot of cultural overlap (Sprachbund).
>
> 3. Your use case
> Training from scratch for a generic system on very large datasets is
> different than fine-tuning for a domain on small data.  (For the former,
> you usually want strict 1:1ness, e.g. miles should not convert to
> kilometres.)  It won't work well out of the box if you're doing adversarial
> attacks or need it calibrated across language pairs.
>
> 4. If the low-resource language is the source or the target language
> Just imagine a human doing this, who only knows one of the languages.
>
> There is an unknown language option (*other UND*) so you can even try it on
> languages not in the dropdown.  That works better if it's the source
> language, not the target language.
>
> If you see issues or have data that can improve a language pair, let me
> know.
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://lists.eamt.org/mailman/private/mt-list/attachments/20210119/f9a2c03d/attachment-0001.html
> >
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> Mt-list mailing list
> Mt-list@eamt.org
> http://lists.eamt.org/mailman/listinfo/mt-list
>
>
> ------------------------------
>
> End of Mt-list Digest, Vol 88, Issue 16
> ***************************************
>


-- 
Best regards,
Nerses Nersesyan
_______________________________________________
Mt-list site list
Mt-list@eamt.org
http://lists.eamt.org/mailman/listinfo/mt-list

Reply via email to