Try clean-corpus-n.perl, for example:
clean-corpus-n.perl corpus-in ru en corpus-out 1 10

or buy 2 terabyte hard drive ;-)

2009/12/14 zhmmc <[email protected]>

> Hi
>     Now I find a problem when I'm training a hierarchical model with
> script of train-model.pl. The parallel corpus I used to train the
> hierarchical model have more than two million sentences. Then when moses
> extracting rules from the corpus, it extract so many rules that I don't have
> enough disk space to store them. The rules take more than 100 GB disk space
> and the extracting process is so aborted. Is there any method to reduce the
> space when extracting rules? Now I fail to train a hierarchical model.
> Thanks in advance.
>                                                                               
>                                                                               
>      &nb!
> sp;                                  zhu hai
>
>
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to