Re: [Moses-support] Different scores with SRILM and IRSTLM

2010-10-29 Thread Felipe Sánchez Martínez
Hi Kenneth, The output of kenlm/query is: Loading the LM will be faster if you build a binary file. Reading english.5gram.lm 5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100

Re: [Moses-support] Different scores with SRILM and IRSTLM

2010-10-29 Thread Felipe Sánchez Martínez
Hi Philipp, I'm not using the decoder. I am using SRILM directly and scoring sentences using the following piece of code: TextStats ts; VocabString words[maxWordsPerLine+1]; char segment_str[segment.size()+1]; //sentece to score is in segment segment.copy(segment_str,

[Moses-support] Problem in building a 4gram language model

2010-10-29 Thread amin farajian
Hi, I know that here is not a good place to ask this question, but I couldn't find other place to ask. I'm using SRILM to build a 4gram language on a 570MB data (about 6,500,000 sentences) using this command: tools/srilm/bin/i686/ngram-count -order 4 -interpolate -kndiscount -unk -text

Re: [Moses-support] KenLM distributed with Moses

2010-10-29 Thread support
Ken, Your new enhancements ROCK! Here are some numbers using rev 3675 and IRSTLM 5.50.01 Machine: Core2Quad, 2.4 Ghz, 4 GB RAM Data: EN-NL sample data, 37,500 segments (micro test sample) 3 gram LM, 3 gram tables (for fast testing) Train LM with SRILM Train tables/tune/eval with

Re: [Moses-support] KenLM distributed with Moses

2010-10-29 Thread Kenneth Heafield
Thanks for sharing! Looks like building my Moses system from scratch finally finished, so I'll be making some memory benchmarks today too. Just so I understand, you ran separate MERT for each of your three cases? Then MERT randomness should explain the insignificant difference in BLEU between

Re: [Moses-support] Different scores with SRILM and IRSTLM

2010-10-29 Thread John Burger
Kenneth Heafield wrote: kenlm's query tool implicitly places s at the beginning. It doesn't appear in the output, but you can see the effect because the n-gram length after the is 2, not 1. Does this happen when kenlm is called from Moses as well? There seem to me to be many reasons not to

Re: [Moses-support] KenLM distributed with Moses

2010-10-29 Thread support
Yes, all scores and times were from scratch without reusing anything. Precision Translation Tools will announce a simpler solution to building a moses system from scratch next week. Essentially, from minimal server configuration to completely installed Moses system in four steps and 30 minute

Re: [Moses-support] Different scores with SRILM and IRSTLM

2010-10-29 Thread Kenneth Heafield
That documentation was specific to kenlm's query tool. kenlm does the same thing as SRI with respect to sentence boundary tokens. As to what that is, I'm deferring to Edinburgh. Kenneth On 10/29/10 10:28, John Burger wrote: Kenneth Heafield wrote: kenlm's query tool implicitly places s at

Re: [Moses-support] Different scores with SRILM and IRSTLM

2010-10-29 Thread Kenneth Heafield
It looks like Moses, by default at least, implicitly adds s and /s to the target side for language model scoring purposes. This behavior is independent of the LM used inside Moses. I don't know if there's an option to disable this behavior. I can tell you that language models are not designed

Re: [Moses-support] Different scores with SRILM and IRSTLM

2010-10-29 Thread Felipe Sánchez Martínez
Hi Kenneth, Just to tell you that after training SRILM with -unk and adding the following code to my SRILM load function _sri_ngramLM-skipOOVs() = false; I get the same score with SRILM and kenlm. Unfortunately this is not the case for IRSTLM. I'll look at my code because I think that there

Re: [Moses-support] Different scores with SRILM and IRSTLM

2010-10-29 Thread Hieu Hoang
i think there's a difference between how sri and irst does backoff. Ken just follow sri to the letter. i don't think there's a canonical way of doing it so they both implement it differently. As you saw from Tom's email, the results (in terms of BLEU) is pretty much the same. On 29/10/2010

Re: [Moses-support] Different scores with SRILM and IRSTLM

2010-10-29 Thread Hieu Hoang
yes, the lm scores are calculated with s is added to the beginning of the sentence ( /s added to the end). Since a phrase-based creates target sentences left-to-right, you always know where the beginning is. For chart decoding, it's also added but the input explicitly has s.../s so the only

Re: [Moses-support] Different scores with SRILM and IRSTLM

2010-10-29 Thread Felipe Sánchez Martínez
Hi, I have fixed the problem. Now I get exactly the same score with IRSTLM. The code from a colleague I was using did not take into acount the initial n-grams of the sentence. Thank you very much for your help Regards -- Felipe El 29/10/10 21:22, Hieu Hoang escribió: yes, the lm scores are

[Moses-support] Cannot find SRILM's library Error

2010-10-29 Thread Charles Chiu
Hi, I am fairly new to MOSES project and trying to build it. I have been able to build both SRILM and IRSTLM library and confirmed that the library files are being generated. however, I still encounter the Cannot find SRILM's library error. I have confirmed that the directory is correct and