Hi,

the running in parts options only affects the "cooc" file creation -
which is mostly for memory efficiency, so GIZA++ does not
run out of memory. It only makes sense to use this option,
if the cooc file creation runs out of memory.

-phi

On Wed, Dec 23, 2009 at 9:32 PM, Mark Fishel <[email protected]> wrote:
> Dear list,
>
> can anyone direct me to a description of the exact algorithm of
> running giza++ in parts? I know the co-occurrence file is used for
> more memory efficient storage of the translation table and probably
> basically defines which word pairs are to be included into the
> t-table. However I'm not sure how the combination of several
> co-occurrence files is performed if the training data is processed in
> several parts (--parts N). I tried reading the training script (the
> "run_single_giza_on_parts" sub) and the algorithm is still a mystery
> to me.
>
> Thank You in advance,
> Mark Fishel
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to