> I've already tried this and the alignment is rather pulled apart. There
> is barely one word (in a 115617 word, 663177 phoneme corpus) that is
> actually monotone without a phoneme from another word mixed in or a
> phoneme from this word stolen into another word. While i'm not capable
> enough in SMT to analyze why this is, my feeling tells me that by simply
> tuning a few parameters the fundamental problem here won't vanish?
> (PLEASE, correct me if i'm wrong, i'm still a really bad rookie in this
> field ;P)
>
Although forcing monotone alignments sounds like a fairly large
change, you can use the "parameter" trick in this case because of how
Giza's HMM estimation works. It alternates between estimating a prior
distribution over transitions ("jumps") and posterior distributions
when the sentence pair is observed. Because of how Bayes rule works,
if your prior assigns 0 probability to some event (in this case, a
reverse alignment), the posterior distribution will also be 0. For
this reason, if you initialize carefully, you'll be fine.


> On 08/24/2012 04:10 AM, Chris Dyer wrote:
>> I think adding this would be have tremendous value.
> Yay! I'm not alone ;P. Especially in conjunction with PISA, i think this
> would open the door to a vast improvement of alignment quality for many
> use-cases.
>
>> It should be possible to adapt Giza's HMM implementation to produce
>> monotone alignments. These are the changes that would be necessary
>> (and which should be fairly easy, if you can figure out the code):
>
> Unfortunately i'm still not familiar with the code. I'm reading and
> reading, but the almost complete absence of comments and the rather
> chaotic (?? to me at least ??) organization of the code doesn't make it
> easy for me :/.
>
>> 1) alignment distribution initialization. by default Giza initializes
>> the HMM transition probabilities to be uniform (effectively making the
>> first iteration of HMM training the same as one more iteration of
>> Model 1). You would need to alter this to make "reverse" jumps have
>> probability 0.
>>
>> 2) smoothing. by default, Giza does something to prevent probabilities
>> from ending up zero (maybe add alpha?). This is fine for monotone
>> jumps, but you want to make sure that "backward" jumps end up zero.
>
> So, for these Tips i'm really grateful! I've already tried to track down
> the "model5smoothfactor" parameter (since i assumed that would lead me
> to where distortion happened, since its doc-tag says its the distortion
> smoothing). Sadly i had to find out that it looks like this parameter
> isn't used at all, since the parsing is commented out?!?
> main.cpp:
> //makeSetCommand("model5smoothfactor","0.0",getGlobalParSet(),2);
>
> A few tests on a small corpus with values of 1000, 1, 0, -1, -1000
> yielded *exactly* the same results! I'm a bit confused now ... is it to
> be expected that some of the GIZA++ parameters aren't even implemented
> despite the help advertising them?
>
> Anyways, i'll try to dig around in the code a bit more based on your
> tips. But, if you happen to be familiar with the code, i'd be really
> really grateful for some even more detailed pointers :(. I have that bad
> feeling that this will take a really really long time if i'll wander the
> maze of GIZA code all alone ;P.
>
> Thanks to you all for your help and support! And Best Regards from Germany!
> - Dario Ernst
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to