Hi Michael, By default the buildlm tool uses Stupid Back-off smoothing and sets the ngram order to 3. (These default values should be printed to stderr when you run the command). So your model.BloomMap contains a stupid-backoff smoothed trigram model estimated from the corpus train2 that you provided.
The querylm tool functions much like the ngram tool in the SRILM toolkit. It adds <s> </s> to each sentence in the test corpus by default and scores each ngram in turn. It would be easy to get the ngrams themselves by adding a print statement at the start of the StupidBackOffRandLM::getProb function in RandLM.cpp. As Philipp said, there aren't actually any ngrams in the model, just 0s and 1s. If you're interested in seeing what the quantization/approximation errors look like, you might want to add the -get-counts flag to your querylm command. That will produce counts rather than backed-off log-probs which you could check against the model.counts.sorted file that should have been produced by buildlm. Cheers, David Quoting Michael Zuckerman <[EMAIL PROTECTED]>: > Hi, > > I am trying to generate Language Model with RandLM tool, which uses Bloom > filter. I ran the tool with the commands > ../randlm/bin/buildlm -struct BloomMap -falsepos 8 -values 8 -output-prefix > model < ./train2 > ../randlm/bin/querylm -randlm model.BloomMap -test-path train2 -test-type > corpus > scores > > where the file train2 contains the tokenized lowercased corpus. The second > command produced the file scores, which contains the logs of the > probabilities of the ngrams. > However, this file (scores) does not contain the ngrams, So it's unclear to > what ngrams these probabilities relate. Could you please help - how can I > extract something like ARPA format from the files RandLM produces. > > Thanks, > Michael. > -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
