Hi Sir,
Thank you for replying to my mail. Yes, I have thought about this solution for alignments, but the heuristics used in moses got me thinking, and I wanted to use the heuristic to obtain the final alignments(since the alignments are of a higher quality). So, my question would be more like, if I could replace the function of GIZA++ alone(computing the alignments in both directions) by a customized aligner? Regarding the next question, when we augment the parallel corpus with entries from a bilingual dictionary, the alignments are computed over the entire corpus. Now, the probability that a word in source language s, is translated to a MWE t1 t2 t3 in the target language needs to be computed. Initially, GIZA++ would take the event of s being translated to t1 equally likely to the event of s being translated to t1 t2 t3. Even after GIZA++ completes its EM iterations, the probabilities of impossible events like s being translated to t1 alone or t2 alone is not zero because of this. I hope I am not being too vague about this problem. The thing is using a dictionary did not improve the quality of alignments obtained on the same corpora. We worked on the English-Hindi pair using 'tourism' corpus. The size of the dictionary is considerably large and has about 20,000k entries. Thank you. - Regards, Prasanth On Mon, Nov 29, 2010 at 9:21 PM, Philipp Koehn <[email protected]> wrote: > Hi, > > > I am familiar with the architecture of Moses, and know that the 2nd and > 3rd > > steps involve computing alignments in both directions while the 4th step > > applies the heuristic(grow,union ...) to obtain the final alignments. > These > > alignments are further used to extract the phrase-pairs. Now my question > is, > > what would be the best way to incorporate the alignments into Moses. > > One way would be to duplicate the files generated by GIZA++ in both step > 2 > > and 3, and start the training procedure from step:4. However, I was > > wondering is there was a much simpler method to use the customized > > alignments in Moses. > > If you have your own alignment method, it would be best to skip the > word alignment steps of the training steps and start with step 4. > http://www.statmt.org/moses/?n=FactoredTraining.HomePage > > > Also in the process of MT, if I wanted to use a bilingual dictionary, > would > > it be ideal to use the dictionary in GIZA++ while computing the > alignments, > > or to augment the corpus with the entries in the dictionary. Most of the > > target words for the entries in the dictionary are MWEs, and hence > > augmenting the corpus did not bring about any improvements when we > conducted > > the experiments. Could you kindly suggest an appropriate method to be > used > > in this context. > > I am not sure what the problem is here - the inclusion of a dictionary as > additional parallel corpus data is the standard method. I am not entirely > sure why their translations as MWEs should be a problem. > > -phi > -- "Theories have four stages of acceptance. i) this is worthless nonsense; ii) this is an interesting, but perverse, point of view, iii) this is true, but quite unimportant; iv) I always said so." --- J.B.S. Haldane
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
