The command actually trains the whole pipeline including mining a
transliteration corpus from the parallel corpus. The language model for the
transliteration module is trained on characters (target-side of the mined
transliteration corpus). But as Anoop mentioned, if you have a
transliteration corpus, you can bypass the mining step and run the
remaining pipeline OR concatenate mined corpus with yours. The mined
transliteration corpus sometimes get useful pairs which are not exactly
transliterations, but they help. See the following related paper for details

https://www.aclweb.org/anthology/E/E14/E14-4029.pdf

Btw the `post-decoding-transliteration.pl` script uses regular language
model (trained on words) to select the best transliteration, given the
context.

Nadir



On Wed, Jan 31, 2018 at 1:37 PM, <[email protected]> wrote:

> Send Moses-support mailing list submissions to
>         [email protected]
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         http://mailman.mit.edu/mailman/listinfo/moses-support
> or, via email, send a message with subject or body 'help' to
>         [email protected]
>
> You can reach the person managing the list at
>         [email protected]
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Moses-support digest..."
>
>
> Today's Topics:
>
>    1. New to Moses,     Need Information regarding Transliteration
>       (nikhil t.v.s)
>    2. Re: Improving Accuracy Level (Raj Dabre)
>    3. Re: New to Moses, Need Information regarding Transliteration
>       (Anoop (?????))
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Wed, 31 Jan 2018 15:06:08 +0530
> From: "nikhil t.v.s" <[email protected]>
> Subject: [Moses-support] New to Moses,  Need Information regarding
>         Transliteration
> To: [email protected]
> Message-ID:
>         <CACxbnusgtLGjdApYLBE_vDN91wWLoE+ZU8mP18+
> [email protected]>
> Content-Type: text/plain; charset="utf-8"
>
> Hi,
>       Have some questions regarding transliteration training using Moses.
> There is minimal documentation.
>
>
> ../mosesdecoder/scripts/Transliteration/train-transliteration-module.pl \
> --corpus-f <foreign text> --corpus-e <target text> \ --alignment <path to
> aligned text> \ --moses-src-dir <moses decoder path> --external-bin-dir
> <external tools> \ --input-extension <input extension>--output-extension
> <output-extension> \ --srilm-dir <sri lm binary path> --out-dir <path to
> generate output files>
>
>
> Is the lm trained over words or characters?
> A parallel corpus of source language word and destination language word
> will suffice?
> Should the corpus be character segmented for training?
>
>
> Regards,
> T. Venkata Sai Nikhil,
> International Institute of Information Technology -Hyderabad.
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: http://mailman.mit.edu/mailman/private/moses-support/
> attachments/20180131/12b1f64c/attachment-0001.html
>
> ------------------------------
>
> Message: 2
> Date: Wed, 31 Jan 2018 18:56:23 +0900
> From: Raj Dabre <[email protected]>
> Subject: Re: [Moses-support] Improving Accuracy Level
> To: Emmanuel Dennis <[email protected]>
> Cc: [email protected]
> Message-ID:
>         <CAB3gfjB_DbMC4PvmqpOE_YjVWC9qpO9ZUQJTWH9VxkVvPGo_Fw@
> mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hi,
>
> You might want to look up "N-best list re-scoring using neural features"
>
> On Mon, Jan 29, 2018 at 6:39 PM, Emmanuel Dennis <
> [email protected]>
> wrote:
>
> > Hi!
> >
> > What are some of the deep learning/big data strategies that can be used
> to
> > improve the accuracy level of a developed statistical machine translation
> > system?
> >
> >
> > Your assistance will be highly appreciated.
> >
> > _______________________________________________
> > Moses-support mailing list
> > [email protected]
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> >
> >
>
>
> --
> Raj Dabre.
> Doctoral Student,
> Graduate School of Informatics,
> Kyoto University.
> CSE MTech, IITB., 2011-2014
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: http://mailman.mit.edu/mailman/private/moses-support/
> attachments/20180131/320677fe/attachment-0001.html
>
> ------------------------------
>
> Message: 3
> Date: Wed, 31 Jan 2018 10:36:41 +0000
> From: Anoop (?????)     <[email protected]>
> Subject: Re: [Moses-support] New to Moses,      Need Information regarding
>         Transliteration
> To: "nikhil t.v.s" <[email protected]>
> Cc: [email protected]
> Message-ID:
>         <CADXxMYfeCfakjYw0ftn1-t=8Z5ZAQscii6XLJY4oEm=kD8EQow@
> mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hi Nikhil,
>
> The command you have mentioned is to be used for transliteration mining
> from a parallel translation corpus. If you already have a parallel
> transliteration corpus, you can ignore this command and use the standard
> Moses pipeline to train the model. The parallel transliteration corpus
> should be character segmented, and you should train the LM over characters.
>
> Anoop.
>
> On Wed 31 Jan, 2018, 15:08 nikhil t.v.s, <[email protected]>
> wrote:
>
> > Hi,
> >       Have some questions regarding transliteration training using Moses.
> > There is minimal documentation.
> >
> >
> > ../mosesdecoder/scripts/Transliteration/train-transliteration-module.pl
> \
> > --corpus-f <foreign text> --corpus-e <target text> \ --alignment <path to
> > aligned text> \ --moses-src-dir <moses decoder path> --external-bin-dir
> > <external tools> \ --input-extension <input extension>--output-extension
> > <output-extension> \ --srilm-dir <sri lm binary path> --out-dir <path to
> > generate output files>
> >
> >
> > Is the lm trained over words or characters?
> > A parallel corpus of source language word and destination language word
> > will suffice?
> > Should the corpus be character segmented for training?
> >
> >
> > Regards,
> > T. Venkata Sai Nikhil,
> > International Institute of Information Technology -Hyderabad.
> > _______________________________________________
> > Moses-support mailing list
> > [email protected]
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: http://mailman.mit.edu/mailman/private/moses-support/
> attachments/20180131/e907e98d/attachment.html
>
> ------------------------------
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
> End of Moses-support Digest, Vol 135, Issue 34
> **********************************************
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to