Hey Marcin, On 08/24/2012 12:52 PM, Marcin Junczys-Dowmunt wrote: > You are aligning unsegmented words to segmented phoneme sequences, do I > understand that correctly? Maybe it's worth to use letter sequences > instead of words and replace spaces with special characters. > Like this: > > t h i s _ i s _ a n _ e x a m p l e .
I'm not quite sure what you mean by unsegmented words? But my Data tends to look something like this (of course more complex and in czech, but i'll write something by hand: hello world, this is a test. h e l o w oe r l d th i s i z a t ae z t alignment is cool ae l a i ng m e n t i z k u l and so on. Now i'm trying to to find alignments like: hello=h e l o world=w oe r l d this=th i s is=i z a=a test=t ae z t alignment=ae l a i ng m e n t is=i z cool=k u l But i believe that what you suggest is already done by the postprocessing tool i use (PISA), only in a much more sophisticated manner. The only problem is non-monotonity for me, really ;(. But thanks for your suggestion, i think i'll try it out nonetheless (i tend to clutch at every straw at this stage ;P). Thanks and Best Regards - Dario _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
