Re: [Moses-support] Creating Language Model from google 1gram file

HOANG Cong Duy Vu Thu, 24 Jan 2013 02:41:35 -0800

Hi,

I guess you can run as follows:


build-sublm.pl --size <N> --ngrams <YOUR_NGRAM_FILE> --sublm
<YOUR_NGRAM_FILE.LM> [--prune-singletons] [--kneser-ney|--witten-bell]
merge-sublm.pl --size <N> --sublm  <YOUR_NGRAM_FILE.LM> -lm iARPA_LM.gz
(then with ARPA files you can use KenLM to build binary LM files)

--
Cheers,
Vu


On Thu, Jan 24, 2013 at 6:14 AM, Peled Guy <[email protected]> wrote:

> Hi,
>
> I'm working on a Transliteration project.
> The input is a word in one language and the output is the same word in
> English (not translated).
> My language Model will created from google 1gram file - while each letter
> of a word should be a word.
> This is the original file:
>
> </S>    95119665584
> <S>     95119665584
> ,       30578667846
> .       22077031422
> <UNK>   21594821357
> the     19401194714
> -       16337125274
> of      12765289150
> and     12522922536
>
> This is the file after inserting spaces between words letters:
>
> t h e     19401194714
> -       16337125274
> o f      12765289150
> a n d     12522922536
>
> Now I have "1gram" file that contains not just 1gram (1 word each line),
> but also 2grams\3grams\etc.
> How can I run the SRILM "ngram-count" script to create a Language Model ?
> When I'm running the script normally , the integers are calculated as
> words too - and not as Probability\number of appearances.
>
> Can anyone help me please?
>
> Thank you,
> Guy.
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Creating Language Model from google 1gram file

Reply via email to