Hi, not knowing all the details about buildlm, I may only point out that the randomized language model uses a data structure from which it is impossible to retrieve information about all the n-grams that are stored in it. It is only possible to query the data structure against ngrams.
Having said that, there may be some intermediate data structures that are building during the construction of the randomized language model that retain the type of information you are interested in. -phi On Wed, Nov 26, 2008 at 1:26 PM, Michael Zuckerman <[EMAIL PROTECTED]>wrote: > Hi, > > I am trying to generate Language Model with RandLM tool, which uses Bloom > filter. I ran the tool with the commands > ../randlm/bin/buildlm -struct BloomMap -falsepos 8 -values 8 -output-prefix > model < ./train2 > ../randlm/bin/querylm -randlm model.BloomMap -test-path train2 -test-type > corpus > scores > > where the file train2 contains the tokenized lowercased corpus. The second > command produced the file scores, which contains the logs of the > probabilities of the ngrams. > However, this file (scores) does not contain the ngrams, So it's unclear to > what ngrams these probabilities relate. Could you please help - how can I > extract something like ARPA format from the files RandLM produces. > > Thanks, > Michael. > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > >
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
