Dear Gang, I don't know any tool for word alignment using a dictionary. Anyhow, Hunalign does sentence alignment with the help of dictionaries. I have done some promising experiments using dictionaries to clean sentence aligned corpora. I found that: - dictionaries with domain specific vocabulary are very beneficial - bad dictionaries e.g. created with GIZA++ are somewhat beneficial - dictionaries are best used to prevent suspicious sentence pairs to be unduly removed. The other way around may remove a lot of good pairs with uncommon words.
Yours, Per Tunedal On Fri, Oct 9, 2015, at 13:02, gang tang wrote: > Dear All, > > Since there are no answers to my questions, I assume that there are no > easy fixes to the alignment problem. However, just out of curiosity, > shouldn't there be alignment tools that take lexical considerations > into account while aligning parallel corpus? I mean, alignment tools > that look up translations for specific words in a domain-specifc > dictionary during alignment? Could there be any reason that it is not > an interesting area to explore? > > Best Regards, Gang > > > > 在 2015-09-25 19:34:13,"gang tang" <[email protected]> 写道: >> Dear all, >> >> I have a problem with alignment. I'd greatly appreciate if anyone can >> help solve my issue. >> >> I have the following corpus: >> >> “sandalo camufluge" -> "camufluge sandal" "sandalo daino" -> "daino >> sandal" "sandalo madras" -> "madras sandal" "sandalo vernice" -> >> "vernice sandal" >> >> The alignment software I used was GIZA++, and the alignment result >> was always 0-0 1-1, which meant that "sandalo" wasn't aligned with >> "sandal". And after training phrase.translation.table always had >> entries such as "sandalo" -> "camufluge", "sandalo" -> "daino", "sandalo"->"madras", and "sandalo"->"vernice", and no "sandalo"->"sandal". Is there any way this problem could be solved? Could I add more data to align "sandalo" with "sandal" and translate "sandalo" to "sandal"? How should I tune the system? >> >> Thanks for your attention, >> >> Gang >> >> >> >> >> 网易考拉iPhone6s玫瑰金5288元,现货不加价[1] >> > > > > > > _________________________________________________ > Moses-support mailing list [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support Links: 1. http://rd.da.netease.com/redirect?t=ORBmhG&p=y7fo42&proId=1024&target=http%3A%2F%2Fwww.kaola.com%2Factivity%2Fdetail%2F4650.html%3Ftag%3Dea467f1dcce6ada85b1ae151610748b5
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
