Hi, if you use "intersect" as symmetrization method, then you will end up with a lot of unaligned words, which allow for a large number of phrase pairs to extracted. You will likely get better results (and definitely smaller phrase table) with methods such as "grow-diag-final-and".
-phi On Tue, Apr 3, 2012 at 2:13 AM, Loki Cheng <[email protected]> wrote: > Hi, everyone > I got stuck in the training step 5 due to the generated file is too huge, > such as "extract", "extract.inv" listed below, and I think if I use the > parameter > "--extract-file $ROOT/model/extract.gz", then the problem will be resolved. > And my question is > if I do that, does it affect other things? Will that work? Any suggestion > will be appreciated. > > total 198593660 > -rw-rw-r-- 1 loki loki 801392186 Apr 2 09:57 aligned.intersect > -rw-rw-r-- 1 loki loki 101227601916 Apr 2 23:16 extract > -rw-rw-r-- 1 loki loki 101227606014 Apr 2 23:16 extract.inv > -rw-rw-r-- 1 loki loki 51632535 Apr 2 10:30 lex.e2f > -rw-rw-r-- 1 loki loki 51632535 Apr 2 10:30 lex.f2e > > Best regards > Moonloki > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
