[Moses-support] Creating Language Model from google 1gram file

Peled Guy Thu, 24 Jan 2013 01:09:40 -0800

Hi,

I'm working on a Transliteration project.
The input is a word in one language and the output is the same word in
English (not translated).
My language Model will created from google 1gram file - while each letter
of a word should be a word.
This is the original file:


</S>    95119665584
<S>     95119665584
,       30578667846
.       22077031422
<UNK>   21594821357
the     19401194714
-       16337125274
of      12765289150
and     12522922536

This is the file after inserting spaces between words letters:

t h e     19401194714
-       16337125274
o f      12765289150
a n d     12522922536

Now I have "1gram" file that contains not just 1gram (1 word each line),
but also 2grams\3grams\etc.
How can I run the SRILM "ngram-count" script to create a Language Model ?
When I'm running the script normally , the integers are calculated as words
too - and not as Probability\number of appearances.

Can anyone help me please?

Thank you,
Guy.

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

[Moses-support] Creating Language Model from google 1gram file

Reply via email to