[Moses-support] R: encoding for parallel corpus

Marcello Federico Wed, 18 Jun 2008 23:15:15 -0700

If you have a very small corpus at hand
just use the witten-bell smoothing method.
Do also not go beyond order 3.
.
Best, Marcello
Marcello Federico
FBK-irst Trento, Italy



----- Messaggio originale -----
Da: [EMAIL PROTECTED] <[EMAIL PROTECTED]>
A: 'Philipp Koehn' <[EMAIL PROTECTED]>
Cc: [email protected] <[email protected]>
Inviato: Thu Jun 19 03:28:24 2008
Oggetto: [Moses-support]  encoding for parallel corpus

Hi
  I have a problem. I download the corpus of " factored-corpus.tgz" from the
Moses page in which there is a file namely "pos.lm". I want to know how to
train the file.
I POS tagged my English sentences e.g. "the|DT light|NN was|VBD red|JJ
.|."and extract the pos tag to get the sentence such as "DT NN VBD JJ .".
Then I train such pos sentence by srilm with the following order:
///////////////////////////////////////////////////////////////////
/home/srilm/bin/i686/ngram-count -order 3 -interpolate -kndiscount -text
EN_pos.txt -lm pos.lm
~one of required modified KneserNey count-of-counts is zero
error in discount estimator for order 1
///////////////////////////////////////////////////////////////////////
In such condition no lm file is generated.

When I remove the parameters " -interpolate -kndiscount "
/////////////////////////////////////////////////////////////////
/home/ srilm/bin/i686/ngram-count -order 3  -text EN_pos.txt -lm pos.lm
warning: no singleton counts
GT discounting disabled
warning: discount coeff 1 is out of range: 0.666667
warning: discount coeff 2 is out of range: 0.800271
warning: discount coeff 3 is out of range: 0.439665
warning: discount coeff 4 is out of range: 0.918576
warning: discount coeff 6 is out of range: 0.860417
warning: discount coeff 7 is out of range: 0.900741
warning: discount coeff 1 is out of range: 2.25939
warning: discount coeff 3 is out of range: -0.0390595
warning: discount coeff 4 is out of range: 1.6028
warning: discount coeff 5 is out of range: 1.62952
warning: discount coeff 6 is out of range: -0.17675
BOW denominator for context "NN" is zero; scaling probabilities to sum to 1
BOW denominator for context "VB" is zero; scaling probabilities to sum to 1
BOW denominator for context "IN" is zero; scaling probabilities to sum to 1
////////////////////////////////////////////////////////////////////
In such condition a lm file is generated but when I execute the order"
///////////////////////////////////////////////////////////////////
mert-moses.pl input ref moses/moses-cmd/src/moses model/moses.ini -nbest 200
--working-dir tuning --rootdir
/home/moses_new/bin/moses-scripts/scripts-20080519-1755 "
some error is
///////////////////////////////////////////////////////////////
Loading table into memory...done.
Created lexical orientation reordering
Start loading LanguageModel
/home/yqhe/iwslt2007/moses_new/enfactordata/lm/en.lm : [0.000] seconds
Start loading LanguageModel
/home/yqhe/iwslt2007/moses_new/enfactordata/lm/pos.lm : [1.000] seconds
Finished loading LanguageModels : [1.000] seconds
Start loading PhraseTable
/home/yqhe/iwslt2007/moses_new/enfactordata/tuning/filtered/phrase-table.0-0
,1.1 : [1.000] seconds
Finished loading phrase tables : [3.000] seconds
Created input-output object : [3.000] seconds
Translating: 哦 那个 航班 是 C 三 零 六 。

moses: LanguageModelSRI.cpp:154: virtual float
LanguageModelSRI::GetValue(const std::vector<const Word*,
std::allocator<const Word*> >&, const void**, unsigned int*) const:
Assertion `(*contextFactor[count-1])[factorType] != __null' failed.
Aborted (core dumped)
Exit code: 134
The decoder died. CONFIG WAS -w 0.000000 -lm 0.100000 0.100000 -d 0.100000
0.100000 0.100000 0.100000 0.100000 0.100000 0.100000 -tm 0.030000 0.020000
0.030000 0.020000 0.000000
/////////////////////////////////////////////////////////////////////
So I don't know how to train a lm file by srilm. Can you tell me how you
train pos.lm? Even the specific ngram-count order.


Best regards.

He Yanqing




_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

[Moses-support] R: encoding for parallel corpus

Reply via email to