Dear list, can anyone direct me to a description of the exact algorithm of running giza++ in parts? I know the co-occurrence file is used for more memory efficient storage of the translation table and probably basically defines which word pairs are to be included into the t-table. However I'm not sure how the combination of several co-occurrence files is performed if the training data is processed in several parts (--parts N). I tried reading the training script (the "run_single_giza_on_parts" sub) and the algorithm is still a mystery to me.
Thank You in advance, Mark Fishel _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
