Try clean-corpus-n.perl, for example: clean-corpus-n.perl corpus-in ru en corpus-out 1 10
or buy 2 terabyte hard drive ;-) 2009/12/14 zhmmc <[email protected]> > Hi > Now I find a problem when I'm training a hierarchical model with > script of train-model.pl. The parallel corpus I used to train the > hierarchical model have more than two million sentences. Then when moses > extracting rules from the corpus, it extract so many rules that I don't have > enough disk space to store them. The rules take more than 100 GB disk space > and the extracting process is so aborted. Is there any method to reduce the > space when extracting rules? Now I fail to train a hierarchical model. > Thanks in advance. > > > &nb! > sp; zhu hai > > > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > >
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
