Dear Per,

Thanks for your kind suggestions. I am digging into my data and the source code 
of giza++ to find out what happened to my precious pair of "sandalo vernice" 
and "vernice sandal". I will certainly look into how to utilize Hunalign to 
advance my cause later on.


Thanks again, and best regards,

Gang


At 2015-10-21 22:03:36, "Per Tunedal" <[email protected]> wrote:

Dear Gang,

I don't know any tool for word alignment using a dictionary. Anyhow, Hunalign 
does sentence alignment with the help of dictionaries.

I have done some promising experiments using dictionaries to clean sentence 
aligned corpora. I found that:

- dictionaries with domain specific vocabulary are very beneficial

- bad dictionaries e.g. created with GIZA++ are somewhat beneficial

- dictionaries are best used to prevent suspicious sentence pairs to be unduly 
removed. The other way around may remove a lot of good pairs with uncommon 
words.

 
Yours,

Per Tunedal

 
 
On Fri, Oct 9, 2015, at 13:02, gang tang wrote:

Dear All,

 
Since there are no answers to my questions, I assume that there are no easy 
fixes to the alignment problem. However, just out of curiosity, shouldn't there 
be alignment tools that take lexical considerations into account while aligning 
parallel corpus? I mean, alignment tools that look up translations for specific 
words in a domain-specifc dictionary during alignment? Could there be any 
reason that it is not an interesting area to explore?

 
Best Regards,

Gang

 
 
 
在 2015-09-25 19:34:13,"gang tang" <[email protected]> 写道:

Dear all,

 
I have a problem with alignment. I'd greatly appreciate if anyone can help 
solve my issue.

 
I have the following corpus:

 
“sandalo camufluge" -> "camufluge sandal"

"sandalo daino" -> "daino sandal"

"sandalo madras" -> "madras sandal"

"sandalo vernice" -> "vernice sandal"

 
The alignment software I used was GIZA++, and the alignment result was always 
0-0 1-1, which meant that "sandalo" wasn't aligned with "sandal". And after 
training phrase.translation.table always had entries such as  "sandalo" -> 
"camufluge", "sandalo" -> "daino", "sandalo"->"madras", and 
"sandalo"->"vernice", and no "sandalo"->"sandal". Is there any way this problem 
could be solved? Could I add more data to align "sandalo" with "sandal" and 
translate "sandalo" to "sandal"? How should I tune the system?

 
Thanks for your attention,

 
Gang

 
 
 
 

网易考拉iPhone6s玫瑰金5288元,现货不加价


 
 
 
 




 
_______________________________________________

Moses-support mailing list

[email protected]

http://mailman.mit.edu/mailman/listinfo/moses-support

 
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to