Hi, the running in parts options only affects the "cooc" file creation - which is mostly for memory efficiency, so GIZA++ does not run out of memory. It only makes sense to use this option, if the cooc file creation runs out of memory.
-phi On Wed, Dec 23, 2009 at 9:32 PM, Mark Fishel <[email protected]> wrote: > Dear list, > > can anyone direct me to a description of the exact algorithm of > running giza++ in parts? I know the co-occurrence file is used for > more memory efficient storage of the translation table and probably > basically defines which word pairs are to be included into the > t-table. However I'm not sure how the combination of several > co-occurrence files is performed if the training data is processed in > several parts (--parts N). I tried reading the training script (the > "run_single_giza_on_parts" sub) and the algorithm is still a mystery > to me. > > Thank You in advance, > Mark Fishel > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
