Hi Michael,

By default the buildlm tool uses Stupid Back-off smoothing and sets
the ngram order to 3. (These default values should be printed to
stderr when you run the command). So your model.BloomMap contains a
stupid-backoff smoothed trigram model estimated from the corpus train2
that you provided.

The querylm tool functions much like the ngram tool in the SRILM
toolkit. It adds <s> </s> to each sentence in the test corpus by
default and scores each ngram in turn. It would be easy to get the
ngrams themselves by adding a print statement at the start of the
StupidBackOffRandLM::getProb function in RandLM.cpp.

As Philipp said, there aren't actually any ngrams in the model, just  
0s and 1s.

If you're interested in seeing what the quantization/approximation
errors look like, you might want to add the -get-counts flag to your
querylm command. That will produce counts rather than backed-off
log-probs which you could check against the model.counts.sorted file
that should have been produced by buildlm.

Cheers,
David




Quoting Michael Zuckerman <[EMAIL PROTECTED]>:

> Hi,
>
> I am trying to generate Language Model with RandLM tool, which uses Bloom
> filter. I ran the tool with the commands
> ../randlm/bin/buildlm -struct BloomMap -falsepos 8 -values 8 -output-prefix
> model < ./train2
> ../randlm/bin/querylm -randlm model.BloomMap -test-path train2 -test-type
> corpus > scores
>
> where the file train2 contains the tokenized lowercased corpus. The second
> command produced the file scores, which contains the logs of the
> probabilities of the ngrams.
> However, this file (scores) does not contain the ngrams, So it's unclear to
> what ngrams these probabilities relate. Could you please help - how can I
> extract something like ARPA format from the files RandLM produces.
>
> Thanks,
>     Michael.
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.




_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to