Hi,
This is still all very new to me so apologies if this is not the correct place to ask this questions. I am wanting to take the English Trie Language Model (5.5TB) created from the Common Crawl data set: http://data.statmt.org/ngrams/lm/en.trie Then extract all n-grams that contain a certain word. This needs to be done for a list of 100 words. For example if I was looking for all n-grams that contained the word "discombobulated" I would want an output file containing the n-gram that contains that word and the number of times that n-gram occurs: word1 discombobulated 25 word1 discombobulated word3 40 Due to the size of the file, this is something I am keen to get right first time. For this reason is someone able to give me an example of how this can be done and would this kind of query be possible with 64GB of RAM? Thanks, Graeme
_______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support