Hi,

not knowing all the details about buildlm, I may only
point out that the randomized language model uses
a data structure from which it is impossible to retrieve
information about all the n-grams that are stored in it.
It is only possible to query the data structure against
ngrams.

Having said that, there may be some intermediate data
structures that are building during the construction of
the randomized language model that retain the type
of information you are interested in.

-phi

On Wed, Nov 26, 2008 at 1:26 PM, Michael Zuckerman
<[EMAIL PROTECTED]>wrote:

> Hi,
>
> I am trying to generate Language Model with RandLM tool, which uses Bloom
> filter. I ran the tool with the commands
> ../randlm/bin/buildlm -struct BloomMap -falsepos 8 -values 8 -output-prefix
> model < ./train2
> ../randlm/bin/querylm -randlm model.BloomMap -test-path train2 -test-type
> corpus > scores
>
> where the file train2 contains the tokenized lowercased corpus. The second
> command produced the file scores, which contains the logs of the
> probabilities of the ngrams.
> However, this file (scores) does not contain the ngrams, So it's unclear to
> what ngrams these probabilities relate. Could you please help - how can I
> extract something like ARPA format from the files RandLM produces.
>
> Thanks,
>     Michael.
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to