Hi Kenneth,
The output of kenlm/query is:
Loading the LM will be faster if you build a binary file.
Reading english.5gram.lm
5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
Hi Philipp,
I'm not using the decoder. I am using SRILM directly and scoring
sentences using the following piece of code:
TextStats ts;
VocabString words[maxWordsPerLine+1];
char segment_str[segment.size()+1]; //sentece to score is in segment
segment.copy(segment_str,
Hi,
I know that here is not a good place to ask this question, but I couldn't
find other place to ask.
I'm using SRILM to build a 4gram language on a 570MB data (about 6,500,000
sentences) using this command:
tools/srilm/bin/i686/ngram-count -order 4 -interpolate -kndiscount
-unk -text
Ken,
Your new enhancements ROCK! Here are some numbers using rev 3675 and
IRSTLM 5.50.01
Machine: Core2Quad, 2.4 Ghz, 4 GB RAM
Data: EN-NL sample data, 37,500 segments (micro test sample)
3 gram LM, 3 gram tables (for fast testing)
Train LM with SRILM
Train tables/tune/eval with
Thanks for sharing! Looks like building my Moses system from scratch
finally finished, so I'll be making some memory benchmarks today too.
Just so I understand, you ran separate MERT for each of your three
cases? Then MERT randomness should explain the insignificant difference
in BLEU between
Kenneth Heafield wrote:
kenlm's query tool implicitly places s at the beginning. It doesn't
appear in the output, but you can see the effect because the n-gram
length after the is 2, not 1.
Does this happen when kenlm is called from Moses as well?
There seem to me to be many reasons not to
Yes, all scores and times were from scratch without reusing anything.
Precision Translation Tools will announce a simpler solution to building a
moses system from scratch next week. Essentially, from minimal server
configuration to completely installed Moses system in four steps and 30
minute
That documentation was specific to kenlm's query tool. kenlm does the
same thing as SRI with respect to sentence boundary tokens. As to what
that is, I'm deferring to Edinburgh.
Kenneth
On 10/29/10 10:28, John Burger wrote:
Kenneth Heafield wrote:
kenlm's query tool implicitly places s at
It looks like Moses, by default at least, implicitly adds s and /s
to the target side for language model scoring purposes. This behavior
is independent of the LM used inside Moses. I don't know if there's an
option to disable this behavior. I can tell you that language models
are not designed
Hi Kenneth,
Just to tell you that after training SRILM with -unk and adding the
following code to my SRILM load function
_sri_ngramLM-skipOOVs() = false;
I get the same score with SRILM and kenlm. Unfortunately this is not the
case for IRSTLM. I'll look at my code because I think that there
i think there's a difference between how sri and irst does backoff. Ken
just follow sri to the letter.
i don't think there's a canonical way of doing it so they both implement
it differently. As you saw from Tom's email, the results (in terms of
BLEU) is pretty much the same.
On 29/10/2010
yes, the lm scores are calculated with s is added to the beginning of
the sentence ( /s added to the end). Since a phrase-based creates
target sentences left-to-right, you always know where the beginning is.
For chart decoding, it's also added but the input explicitly has
s.../s so the only
Hi,
I have fixed the problem. Now I get exactly the same score with IRSTLM.
The code from a colleague I was using did not take into acount the
initial n-grams of the sentence.
Thank you very much for your help
Regards
--
Felipe
El 29/10/10 21:22, Hieu Hoang escribió:
yes, the lm scores are
Hi,
I am fairly new to MOSES project and trying to build it. I have been able to
build both SRILM and IRSTLM library and confirmed that the library files are
being generated.
however, I still encounter the Cannot find SRILM's library error. I have
confirmed that the directory is correct and
14 matches
Mail list logo