Dear Gang, To clarify: I used my own scripts for my experiments. I tried
improving already aligned data (e.g. samples from the Europarl corpus)
by eliminating bad alignments. I was inspired by Hunalign to try using
dictionaries. An approach that proved useful to prevent unduly removal
of alignments by other tests.

BTW There was an interesting discussion about using dictionaries in
Moses some 6 months ago. Different approaches where discussed, like
training on dictionaries or using them as a back-off. That might be of
interest to you.

Good luck with your explorations!

Yours, Per Tunedal


On Thu, Oct 22, 2015, at 09:56, gang tang wrote:
>
> Dear Per,
>
> Thanks for your kind suggestions. I am digging into my data and the
> source code of giza++ to find out what happened to my precious pair of
> "sandalo vernice" and "vernice sandal". I will certainly look into how
> to utilize Hunalign to advance my cause later on.
>
>
>
> Thanks again, and best regards,
>
> Gang At 2015-10-21 22:03:36, "Per Tunedal"
> <[email protected]> wrote:
>> Dear Gang, I don't know any tool for word alignment using a
>> dictionary. Anyhow, Hunalign does sentence alignment with the help of
>> dictionaries. I have done some promising experiments using
>> dictionaries to clean sentence aligned corpora. I found that:
>> - dictionaries with domain specific vocabulary are very beneficial
>> - bad dictionaries e.g. created with GIZA++ are somewhat beneficial
>> - dictionaries are best used to prevent suspicious sentence pairs to
>>   be unduly removed. The other way around may remove a lot of good
>>   pairs with uncommon words.
>>
>> Yours, Per Tunedal
>>
>>
>> On Fri, Oct 9, 2015, at 13:02, gang tang wrote:
>>> Dear All,
>>>
>>> Since there are no answers to my questions, I assume that there are
>>> no easy fixes to the alignment problem. However, just out of
>>> curiosity, shouldn't there be alignment tools that take lexical
>>> considerations into account while aligning parallel corpus? I mean,
>>> alignment tools that look up translations for specific words in a
>>> domain-specifc dictionary during alignment? Could there be any
>>> reason that it is not an interesting area to explore?
>>>
>>> Best Regards, Gang
>>>
>>>
>>>
>>> 在 2015-09-25 19:34:13,"gang tang" <[email protected]> 写道:
>>>> Dear all,
>>>>
>>>> I have a problem with alignment. I'd greatly appreciate if anyone
>>>> can help solve my issue.
>>>>
>>>> I have the following corpus:
>>>>
>>>> “sandalo camufluge" -> "camufluge sandal" "sandalo daino" -> "daino
>>>> sandal" "sandalo madras" -> "madras sandal" "sandalo vernice" ->
>>>> "vernice sandal"
>>>>
>>>> The alignment software I used was GIZA++, and the alignment result
>>>> was always 0-0 1-1, which meant that "sandalo" wasn't aligned with
>>>> "sandal". And after training phrase.translation.table always had
>>>> entries such as
"sandalo" -> "camufluge", "sandalo" -> "daino", "sandalo"->"madras",
and "sandalo"->"vernice", and no "sandalo"->"sandal". Is there any way
this problem could be solved? Could I add more data to align "sandalo"
with "sandal" and translate "sandalo" to "sandal"? How should I tune
the system?
>>>>
>>>> Thanks for your attention,
>>>>
>>>> Gang
>>>>
>>>>
>>>>
>>>>
>>>> 网易考拉iPhone6s玫瑰金5288元,现货不加价[1]


>>>>
>>>
>>>
>>>
>>>


>>>
>>> _________________________________________________
>>> Moses-support mailing list [email protected]
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>
>
>


>



Links:

  1. 
http://rd.da.netease.com/redirect?t=ORBmhG&p=y7fo42&proId=1024&target=http%3A%2F%2Fwww.kaola.com%2Factivity%2Fdetail%2F4650.html%3Ftag%3Dea467f1dcce6ada85b1ae151610748b5
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to