Hi WULongski,

When you enable

 [TRAINING]
 transliteration-module = "yes"


in the EMS, it simply trains transliteration model from your word-aligned
parallel corpus. This includes i) mining transliteration corpus, ii) then
training the entire phrase-based pipeline over character corpus that was
just mined. At the end you have a transliteration model that can be used to
transliterate OOVs.

You can either use the model to transliterate in a post-decoding step i.e.
after the actual decoder has run and now you just need to transliterate the
OOVs. This is done through

post-decoding-transliteration = "yes"


An alternative is to do it at the same time the actual decoding takes place

in-decoding-transliteration = "yes"



This allows the decoder to reorder OOVs along with the regular words. But I
did not get any better BLEU scores on average.

The current implementation is independent of tuning i.e. you don't have to
retune the system when you enable transliteration. Tuning transliteration
parameters (LM-OOV, transliteration phrase-table, etc) did not improve
results so I just fixed weights. Currently LM-OOV feature gets precedence.

>> --alignment means what ?  it means I should have other corpus??? I only
have the en-fr corpus. what does the aligned text mean?

You need word-alignments to mine transliteration pairs. The miner works on
1-1 word-list.

>> --srilm-dir <sri lm binary path>       it means that I should install
the srilm????

it will use lmplz if you don't specify srilm-dir

>>  I want to know that if I want to translate french to english . I should
first do baseline steps ,then I will do these steps in

Transliteration of OOV helps when source and target are written in
different writing scripts. For French and English, simply copying over the
unknown word would be more fruitful. Transliteration may helpful
interesting cognates and interesting transformation of borrowed words. But
I don't think it will improve translation quality.


>>  --input-extension <input extension>--output-extension
<output-extension> means  what????
>> if I just want to translate french to english ,should I use
--input-extension  fr --output-extension en ?

Yes !!! But try using EMS than running the command manually. It is much
easier.

Nadir

On Fri, Dec 30, 2016 at 7:25 PM, <[email protected]> wrote:

> Send Moses-support mailing list submissions to
>         [email protected]
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         http://mailman.mit.edu/mailman/listinfo/moses-support
> or, via email, send a message with subject or body 'help' to
>         [email protected]
>
> You can reach the person managing the list at
>         [email protected]
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Moses-support digest..."
>
>
> Today's Topics:
>
>    1. FYI: How the results of mkcls actually used during        Moses
>       training (Lane Schwartz)
>    2. some questions about the OOV ( WULongski )
>    3. something about the Unsupervised Transliteration  Model
>       ( WULongski )
>    4. Re: Moses-support Digest, Vol 122, Issue 38 (Mike Ladwig)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Thu, 29 Dec 2016 11:18:49 -0600
> From: Lane Schwartz <[email protected]>
> Subject: [Moses-support] FYI: How the results of mkcls actually used
>         during  Moses training
> To: "[email protected]" <[email protected]>
> Message-ID:
>         <CABv3vZkT+DFhWyZ_-2U+GeF+0erQN-iNjYsL=p6kTDOUsUzWYw@
> mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> This email is simply to record a (to my knowledge) previously undocumented
> aspect of how the Moses training scripts interact with giza++.
>
>
> I've been looking through moses/scripts/training/train-model.perl and the
> execution scripts created by EMS, and I ran across Perl function called
> make_classes, which (not surprisingly) calls mkcls. This didn't surprise
> me, as I assumed that giza++ used the resulting classes. But in examining
> the subsequent calls to giza++ (or mgiza), I couldn't see anywhere else in
> the Moses training pipeline that actually uses the *.vcb.classes files
> resulting from the calls to mkcls.
>
> Now, there are certainly use cases where a research might want to
> explicitly make use of these classes (a class LM, for example). But mkcls
> is called by default whenever training Moses using train-model.perl, and in
> the general case, I couldn't find any place where these classes are
> subsequently used. So I wondered: Am I missing something obvious? Are the
> results of mkcls actually used anywhere by default in the Moses training
> pipeline?
>
> After running mgiza --help, it appears that mgiza can accept these class
> files, but it appears that train-model.perl is not actually explicitly
> providing these class files to mgiza. So, I tried running mgiza as it was
> called by train-model.perl in a clean directory, providing it only the
> files that mgiza actually was provided via command flags (the src-tgt.cooc,
> tgt.vcb, and src.vcb files). Run this way, mgiza complains:
>
> ERROR: can not read src.vcb.classes
> ERROR: can not read tgt.vcb.classes
>
> So, the answer is that mgiza does actually need these files, but
> train-model.perl does not explicitly provide them to mgiza, instead relying
> on the fact that mgiza defaults to assuming that the class files exist in
> the same location as the vcb files with the same prefix, but the additional
> suffix .classes
>
> Thanks,
> Lane
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: http://mailman.mit.edu/mailman/private/moses-support/
> attachments/20161229/28368f84/attachment-0001.html
>
> ------------------------------
>
> Message: 2
> Date: Fri, 30 Dec 2016 10:36:00 +0800
> From: " WULongski " <[email protected]>
> Subject: [Moses-support] some questions about the OOV
> To: " moses-support " <[email protected]>
> Message-ID: <[email protected]>
> Content-Type: text/plain; charset="gb18030"
>
> Hi,
>  1?
>  I read the handling OOVs methods   in  http://www.statmt.org/moses/?
> n=Advanced.OOVs#ntoc1. Now I want to use Unsupervised Transliteration
> Model, it means that first i should train a transliteration module. Then I
> should train moses with transliteration option. But in the basline webpage,
> after it I should do the tuning steps to optimize the parameters .
>   Now in the OOVs website , it  doesn't do tuning. So I am confused.
>  I think I should do the tuning.But if I do it,how to do that? the same
> way as the baseline ? Or should I add some parameters to the command of
> nohup nice ~/mosesdecoder/scripts/training/mert-moses.pl \   ~/corpus/
> news-test2008.true.fr ~/corpus/news-test2008.true.en \
>  ~/mosesdecoder/bin/moses train/model/moses.ini --mertdir
> ~/mosesdecoder/bin/ \   &> mert.out &
>
>
>
>  2 ?
>  Some questions about the parameters:
>  in  http://www.statmt.org/moses/?n=Advanced.OOVs#ntoc1
>
> Execute command to train transliteration:
>  ../mosesdecoder/scripts/Transliteration/train-transliteration-module.pl
> \     --corpus-f <foreign text>  --corpus-e <target text>  \
>  --alignment <path to aligned text>  \     --moses-src-dir <moses decoder
> path> --external-bin-dir <external tools>  \     --input-extension <input
> extension>--output-extension <output-extension> \     --srilm-dir <sri lm
> binary path> --out-dir <path to generate output files>
>  --alignment means what ?  it means I should have other corpus??? I only
> have the en-fr corpus. what does the aligned text mean?
>
>   --input-extension <input extension>--output-extension <output-extension>
> means  what????
>  if I just want to translate french to english ,should I use
> --input-extension  fr --output-extension en ?
>
>
>  Thank you very  much !
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: http://mailman.mit.edu/mailman/private/moses-support/
> attachments/20161229/6d61dced/attachment-0001.html
>
> ------------------------------
>
> Message: 3
> Date: Fri, 30 Dec 2016 11:07:51 +0800
> From: " WULongski " <[email protected]>
> Subject: [Moses-support] something about the Unsupervised
>         Transliteration Model
> To: " moses-support " <[email protected]>
> Message-ID: <[email protected]>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Hi,
>  in the web page http://www.statmt.org/moses/?n=Advanced.OOVs
>
> Steps for use outside experiment.perl
>
>
>
> Execute command to train transliteration:
>  ../mosesdecoder/scripts/Transliteration/train-transliteration-module.pl
> \     --corpus-f <foreign text>  --corpus-e <target text>  \
>  --alignment <path to aligned text>  \     --moses-src-dir <moses decoder
> path> --external-bin-dir <external tools>  \     --input-extension <input
> extension>--output-extension <output-extension> \     --srilm-dir <sri lm
> binary path> --out-dir <path to generate output files>
>
>  --srilm-dir <sri lm binary path>       it means that I should install the
> srilm????
>  I want to know that if I want to translate french to english . I should
> first do baseline steps ,then I will do these steps in
> http://www.statmt.org/moses/?n=Advanced.OOVs.
>
>  thank you very much!
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: http://mailman.mit.edu/mailman/private/moses-support/
> attachments/20161229/7956e64e/attachment-0001.html
>
> ------------------------------
>
> Message: 4
> Date: Fri, 30 Dec 2016 11:25:52 -0500
> From: Mike Ladwig <[email protected]>
> Subject: Re: [Moses-support] Moses-support Digest, Vol 122, Issue 38
> To: Hieu Hoang <[email protected]>
> Cc: Moses Support <[email protected]>
> Message-ID:
>         <CAB3VaD16w4aeAcAf163CfAzS7HDtVyBsAZKj0j7B-xKfkxSMMQ@mail.
> gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> On Wed, Dec 28, 2016 at 4:37 AM, Hieu Hoang <[email protected]> wrote:
>
> > I am getting significantly (~20%) lower bleu scores than with 2.x but I
> >> have a lot of testing before I will know why.
> >>
> > Moses and Moses2 should give very similar results. Please let me know
> what
> > you find
> >
>
> In looking at training logs, I am getting many messages like this:
>
> WARNING: sentence 540930 has alignment point (4, 3) out of bounds (4, 4)
> T: europe is changing .
> S: europa verandert sich .
> WARNING: sentence 540931 has alignment point (9, 5) out of bounds (9, 10)
> T: that was the slogan of the last european elections .
> S: das war das motto der letzten europa wahlen .
> WARNING: sentence 540932 has alignment point (6, 0) out of bounds (6, 6)
> T: personally , i am convinced .
> S: personlich stimme ich dem zu .
>
> Thoughts?
> mike.
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: http://mailman.mit.edu/mailman/private/moses-support/
> attachments/20161230/580e245c/attachment.html
>
> ------------------------------
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
> End of Moses-support Digest, Vol 122, Issue 42
> **********************************************
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to