Hi Dr. Koehn,
Thanks for the input. I just ran a small corpus of just 4 sentences through GIZA++ and did get the correct result. However, when I ran my large corpus of 77083 sentences through GIZA++, these 4 sentences were not aligned in the right way. Must be that other areas of the corpus affected these 4 specific alignments. Let me investigate more. Best Regards, Gang At 2015-10-16 02:13:53, "Philipp Koehn" <[email protected]> wrote: Hi, I ran this corpus through GIZA++ and did get the correct result: % head training/corpus.11.* ==> training/corpus.11.en <== sandalo camufluge sandalo daino sandalo madras sandalo vernice ==> training/corpus.11.fr <== camufluge sandalo daino sandalo madras sandalo vernice sandalo % zcat training/giz*11/fr-en.A3.final.gz # Sentence pair (1) source length 2 target length 2 alignment score : 0.536702 camufluge sandalo NULL ({ }) sandalo ({ 2 }) camufluge ({ 1 }) # Sentence pair (2) source length 2 target length 2 alignment score : 0.536702 daino sandalo NULL ({ }) sandalo ({ 2 }) daino ({ 1 }) # Sentence pair (3) source length 2 target length 2 alignment score : 0.536702 madras sandalo NULL ({ }) sandalo ({ 2 }) madras ({ 1 }) # Sentence pair (4) source length 2 target length 2 alignment score : 0.536702 vernice sandalo NULL ({ }) sandalo ({ 2 }) vernice ({ 1 }) So, something must have gone wrong on your end. Are you sure that you preparing the data in the correct format? -phi On Fri, Oct 9, 2015 at 7:02 AM, gang tang <[email protected]> wrote: Dear All, Since there are no answers to my questions, I assume that there are no easy fixes to the alignment problem. However, just out of curiosity, shouldn't there be alignment tools that take lexical considerations into account while aligning parallel corpus? I mean, alignment tools that look up translations for specific words in a domain-specifc dictionary during alignment? Could there be any reason that it is not an interesting area to explore? Best Regards, Gang 在 2015-09-25 19:34:13,"gang tang" <[email protected]> 写道: Dear all, I have a problem with alignment. I'd greatly appreciate if anyone can help solve my issue. I have the following corpus: “sandalo camufluge" -> "camufluge sandal" "sandalo daino" -> "daino sandal" "sandalo madras" -> "madras sandal" "sandalo vernice" -> "vernice sandal" The alignment software I used was GIZA++, and the alignment result was always 0-0 1-1, which meant that "sandalo" wasn't aligned with "sandal". And after training phrase.translation.table always had entries such as "sandalo" -> "camufluge", "sandalo" -> "daino", "sandalo"->"madras", and "sandalo"->"vernice", and no "sandalo"->"sandal". Is there any way this problem could be solved? Could I add more data to align "sandalo" with "sandal" and translate "sandalo" to "sandal"? How should I tune the system? Thanks for your attention, Gang 网易考拉iPhone6s玫瑰金5288元,现货不加价 _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
