Hi,

if you use "intersect" as symmetrization method, then you will end up with
a lot of unaligned words, which allow for a large number of phrase pairs
to extracted. You will likely get better results (and definitely smaller phrase
table) with methods such as "grow-diag-final-and".

-phi

On Tue, Apr 3, 2012 at 2:13 AM, Loki Cheng <[email protected]> wrote:
> Hi, everyone
> I got stuck in the training step 5 due to the generated file is too huge,
> such as "extract", "extract.inv" listed below, and I think if I use the
> parameter
> "--extract-file $ROOT/model/extract.gz", then the problem will be resolved.
> And my question is
> if I do that, does it affect other things? Will that work? Any suggestion
> will be appreciated.
>
> total 198593660
> -rw-rw-r-- 1 loki loki    801392186 Apr  2 09:57 aligned.intersect
> -rw-rw-r-- 1 loki loki 101227601916 Apr  2 23:16 extract
> -rw-rw-r-- 1 loki loki 101227606014 Apr  2 23:16 extract.inv
> -rw-rw-r-- 1 loki loki     51632535 Apr  2 10:30 lex.e2f
> -rw-rw-r-- 1 loki loki     51632535 Apr  2 10:30 lex.f2e
>
> Best regards
> Moonloki
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to