Hi Dr. Koehn,


Thanks for the input. I just ran a small corpus of just 4 sentences through 
GIZA++ and did get the correct result. However, when I ran my large corpus of 
77083 sentences through GIZA++, these 4 sentences were not aligned in the right 
way. Must be that other areas of the corpus affected these 4 specific 
alignments.

Let me investigate more.

Best Regards,

Gang


At 2015-10-16 02:13:53, "Philipp Koehn" <[email protected]> wrote:

Hi,


I ran this corpus through GIZA++ and did get the correct result:


% head training/corpus.11.*
==> training/corpus.11.en <==
sandalo camufluge
sandalo daino
sandalo madras
sandalo vernice

==> training/corpus.11.fr <==
camufluge sandalo
daino sandalo
madras sandalo
vernice sandalo

% zcat training/giz*11/fr-en.A3.final.gz
# Sentence pair (1) source length 2 target length 2 alignment score : 0.536702
camufluge sandalo
NULL ({ }) sandalo ({ 2 }) camufluge ({ 1 })
# Sentence pair (2) source length 2 target length 2 alignment score : 0.536702
daino sandalo
NULL ({ }) sandalo ({ 2 }) daino ({ 1 })
# Sentence pair (3) source length 2 target length 2 alignment score : 0.536702
madras sandalo
NULL ({ }) sandalo ({ 2 }) madras ({ 1 })
# Sentence pair (4) source length 2 target length 2 alignment score : 0.536702
vernice sandalo
NULL ({ }) sandalo ({ 2 }) vernice ({ 1 })


So, something must have gone wrong on your end. Are you sure that you preparing 
the data in the correct format?


-phi


On Fri, Oct 9, 2015 at 7:02 AM, gang tang <[email protected]> wrote:

Dear All,

Since there are no answers to my questions, I assume that there are no easy 
fixes to the alignment problem. However, just out of curiosity, shouldn't there 
be alignment tools that take lexical considerations into account while aligning 
parallel corpus? I mean, alignment tools that look up translations for specific 
words in a domain-specifc dictionary during alignment? Could there be any 
reason that it is not an interesting area to explore?


Best Regards,


Gang



在 2015-09-25 19:34:13,"gang tang" <[email protected]> 写道:

Dear all,

I have a problem with alignment. I'd greatly appreciate if anyone can help 
solve my issue.

I have the following corpus:

“sandalo camufluge" -> "camufluge sandal"
"sandalo daino" -> "daino sandal"
"sandalo madras" -> "madras sandal"
"sandalo vernice" -> "vernice sandal"

The alignment software I used was GIZA++, and the alignment result was always 
0-0 1-1, which meant that "sandalo" wasn't aligned with "sandal". And after 
training phrase.translation.table always had entries such as  "sandalo" -> 
"camufluge", "sandalo" -> "daino", "sandalo"->"madras", and 
"sandalo"->"vernice", and no "sandalo"->"sandal". Is there any way this problem 
could be solved? Could I add more data to align "sandalo" with "sandal" and 
translate "sandalo" to "sandal"? How should I tune the system?

Thanks for your attention,

Gang







网易考拉iPhone6s玫瑰金5288元,现货不加价





 


_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support



_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to