Fix formatting...
Hey,
BilingualLM is implemented and as of last week resides within
moses master:
https://github.com/moses-smt/mosesdecoder/blob/master/moses/LM/BilingualLM.cpp
To compile it you need a NeuralNetwork backend for it. Currently
there are two supported: Oxlm and Nplm. Adding a new backend is
relatively easy, you need to implement the interface as shown here:
https://github.com/moses-smt/mosesdecoder/blob/master/moses/LM/bilingual-lm/BiLM_NPLM.h
To compile with oxlm backend you need to compile moses with the
switch -with-oxlm=/path/to/oxlm
To compile with nplm backend you need to compile moses with the
switch -with-nplm=/path/to/nplm (You need this fork of nplm
https://github.com/rsennrich/nplm
Unfortunately documentaiton is not yet available so here's a
short summary how to train a model and use it using, the nplm
backend:
Use the extract training script to prepare aligned bilingual
corpus:
https://github.com/moses-smt/mosesdecoder/blob/master/scripts/training/bilingual-lm/extract_training.py
You need the following options:
"-e", "--target-language", type="string", dest="target_language")
//Mandatory, for example es
"-f", "--source-language", type="string", dest="source_language")
//Mandatory, for example en
"-c", "--corpus", type="string", dest="corpus_stem") //
path/to/corpus In the directory you have specified there should
be files corpus.sourcelang and corpus.targetlang
"-t", "--tagged-corpus", type="string", dest="tagged_stem")
//Optional for backoff to pos tag
"-a", "--align", type="string", dest="align_file") //Mandatory
alignment file
"-w", "--working-dir", type="string", dest="working_dir")
//Output directory of the model
"-n", "--target-context", type="int", dest="n") /
"-m", "--source-context", type="int", dest="m") //The actual
context size is 2*m + 1, this is the number of words on both left
and right
"-s", "--prune-source-vocab", type="int", dest="sprune") //cutoff
vocabulary threshold
"-p", "--prune-target-vocab", type="int", dest="tprune") //cutoff
vocabulary threshold
Then, use the training script to train the model:
https://github.com/moses-smt/mosesdecoder/blob/master/scripts/training/bilingual-lm/train_nplm.py
Example execution is:
train_nplm.py -w de-en-500250source/ -r de-en150nopos-source750
-n 16 -d 0
--nplm-home=/home/abmayne/code/deepathon/nplm_one_layer/ -c
corpus.1.word -i 750 -o 750
where -i and -o are input and output embeddings
-n is the total ngram size
-d is the number of hidden layyers
-w and -c are the same as the extract_training options
-r is the output directory of the model
Consult the python script for more detailed description of the
options
After you have done that in the output directory you should have
a trained bilingual Neural Network language model
To run it in moses as a feature function you need the following line:
BilingualNPLM
filepath=/mnt/gna0/nbogoych/new_nplm_german/de-en150nopos/train.10k.model.nplm.10
target_ngrams=4 source_ngrams=9
source_vocab=/mnt/gna0/nbogoych/new_nplm_german/de-enIWSLTnopos/vocab.source
target_vocab=/mnt/gna0/nbogoych/new_nplm_german/de-enIWSLTnopos/vocab.targe
The source and target vocab is located in the working directory
used to prepare the neural network language model.
target_ngrams doesn't include the predicted word (so
target_ngrams = 4, would mean 1 word predicted and 4 target
context word)
The total of the model would target_ngrams + source_ngrams + 1)
I will write a proper documentation in the following weeks. If
you have any problems runnning it, please consult me.
Cheers,
Nick
On Wed, Nov 26, 2014 at 1:02 PM, Nikolay Bogoychev
<[email protected] <mailto:[email protected]>> wrote:
Hey,
BilingualLM is implemented and as of last week resides within
moses master:
https://github.com/moses-smt/mosesdecoder/blob/master/moses/LM/BilingualLM.cpp
To compile it you need a NeuralNetwork backend for it.
Currently there are two supported: Oxlm and Nplm. Adding a
new backend is relatively easy, you need to implement the
interface as shown here:
https://github.com/moses-smt/mosesdecoder/blob/master/moses/LM/bilingual-lm/BiLM_NPLM.h
To compile with oxlm backend you need to compile moses with
the switch -with-oxlm=/path/to/oxlm
To compile with nplm backend you need to compile moses with
the switch -with-nplm=/path/to/nplm (You need this fork of
nplm https://github.com/rsennrich/nplm
Unfortunately documentaiton is not yet available so here's a
short summary how to train a model and use it using, the nplm
backend:
Use the extract training script to prepare aligned bilingual
corpus:
https://github.com/moses-smt/mosesdecoder/blob/master/scripts/training/bilingual-lm/extract_training.py
You need the following options:
"-e", "--target-language", type="string",
dest="target_language") //Mandatory, for example es "-f",
"--source-language", type="string", dest="source_language")
//Mandatory, for example en "-c", "--corpus", type="string",
dest="corpus_stem") // path/to/corpus In the directory you
have specified there should be files corpus.sourcelang and
corpus.targetlang "-t", "--tagged-corpus", type="string",
dest="tagged_stem") //Optional for backoff to pos tag "-a",
"--align", type="string", dest="align_file") //Mandatory
alignemtn file "-w", "--working-dir", type="string",
dest="working_dir") //Output directory of the model "-n",
"--target-context", type="int", dest="n") / "-m",
"--source-context", type="int", dest="m") //The actual
context size is 2*m + 1, this is the number of words on both
left and right "-s", "--prune-source-vocab", type="int",
dest="sprune") //cutoff vocabulary threshold "-p",
"--prune-target-vocab", type="int", dest="tprune") //cutoff
vocabulary threshold
Then, use the training script to train the model:
https://github.com/moses-smt/mosesdecoder/blob/master/scripts/training/bilingual-lm/train_nplm.py
Example execution is: train_nplm.py -w de-en-500250source/
-r de-en150nopos-source750 -n 16 -d 0
--nplm-home=/home/abmayne/code/deepathon/nplm_one_layer/ -c
corpus.1.word -i 750 -o 750
where -i and -o are input and output embeddings
-n is the total ngram size
-d is the number of hidden layyers
-w and -c are the same as the extract_training options
-r is the output directory of the model
Consult the python script for more detailed description of
the options
After you have done that in the output directory you should
have a trained bilingual Neural Network language model
To run it in moses as a feature function you need the
following line:
BilingualNPLM
filepath=/mnt/gna0/nbogoych/new_nplm_german/de-en150nopos/train.10k.model.nplm.10
target_ngrams=4 source_ngrams=9
source_vocab=/mnt/gna0/nbogoych/new_nplm_german/de-enIWSLTnopos/vocab.sourcetarget_vocab=/mnt/gna0/nbogoych/new_nplm_german/de-enIWSLTnopos/vocab.targe
The source and target vocab is located in the working
directory used to prepare the neural network language model.
target_ngrams doesn't include the predicted word (so
target_ngrams = 4, would mean 1 word predicted and 4 target
context word)
The total of the model would target_ngrams + source_ngrams + 1)
I will write a proper documentation in the following weeks.
If you have any problems runnning it, please consult me.
Cheers,
Nick
On Wed, Nov 26, 2014 at 11:53 AM, Tom Hoar
<[email protected]
<mailto:[email protected]>> wrote:
Hieu,
Sorry I missed you in Vancouver. I just reviewed your
slide deck from the MosesCore TAUS Round Table in
Vancouver
(taus-moses-industry-roundtable-2014-changes-in-moses-hieu-hoang-university-of-edinburgh).
In particular, I'm interested in the "Bilingual Language
Models" that "replicate Delvin et al, 2014". A search on
statmt.org/moses <http://statmt.org/moses> doesn't show
any hits searching for "delvin". So, A) is the code
finished? If so B) are there any instructions how to
enable/use this feature? If not, C) what kind of help do
you need to test the code for release?
--
Best regards,
Tom Hoar
Managing Director
*Precision Translation Tools Co., Ltd.*
Bangkok, Thailand
Web: www.precisiontranslationtools.com
<http://www.precisiontranslationtools.com>
Mobile: +66 87 345-1875 <tel:%2B66%2087%20345-1875>
Skype: tahoar
_______________________________________________
Moses-support mailing list
[email protected] <mailto:[email protected]>
http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________
Moses-support mailing list
[email protected] <mailto:[email protected]>
http://mailman.mit.edu/mailman/listinfo/moses-support