Hi Barry, That's what I meant by filtering. 300 GB is the memory usage of the pruned phrase table during runtime when loaded as an instance of PhraseDictionaryMemory. The file size of the text file containing the pruned phrase table is roughly 20 GB. The original unpruned phrase table takes up around 50 GB as a text file, but we are not using that one anyway.
19/9/2011, "Barry Haddow" <[email protected]> napisaĆ/a: >> Pruning is also not enough, our filtered phrase table still takes around >> 300 GB when loaded into memory, I did not even dare to try and load the >> unfiltered phrase-table into memory :). But I will take a look at the >> implementation from the marathon, thanks. > >I think Hieu was referring to this >http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc16 >rather than filtering, which may be of some use. It's hard to imagine that a >500G phrase table doesn't contain a lot of noise. I'm surprised that filtering >doesn't remove more though - are you decoding large batches of sentences? _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
