Hi, I guess you can run as follows:
build-sublm.pl --size <N> --ngrams <YOUR_NGRAM_FILE> --sublm <YOUR_NGRAM_FILE.LM> [--prune-singletons] [--kneser-ney|--witten-bell] merge-sublm.pl --size <N> --sublm <YOUR_NGRAM_FILE.LM> -lm iARPA_LM.gz (then with ARPA files you can use KenLM to build binary LM files) -- Cheers, Vu On Thu, Jan 24, 2013 at 6:14 AM, Peled Guy <[email protected]> wrote: > Hi, > > I'm working on a Transliteration project. > The input is a word in one language and the output is the same word in > English (not translated). > My language Model will created from google 1gram file - while each letter > of a word should be a word. > This is the original file: > > </S> 95119665584 > <S> 95119665584 > , 30578667846 > . 22077031422 > <UNK> 21594821357 > the 19401194714 > - 16337125274 > of 12765289150 > and 12522922536 > > This is the file after inserting spaces between words letters: > > t h e 19401194714 > - 16337125274 > o f 12765289150 > a n d 12522922536 > > Now I have "1gram" file that contains not just 1gram (1 word each line), > but also 2grams\3grams\etc. > How can I run the SRILM "ngram-count" script to create a Language Model ? > When I'm running the script normally , the integers are calculated as > words too - and not as Probability\number of appearances. > > Can anyone help me please? > > Thank you, > Guy. > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > >
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
