Make sure PISA really splits words into letters. I have done quite a similar thing (and in this thread there were also literature pointers) and it worked really well. Do not see any reason why it should suddenly not work for your setup. Another problem with this approach may be that you will surpass the 100 words limit of Giza very quickly if you use letters and phonemes as tokens.
W dniu 24.08.2012 16:56, Dario Ernst pisze: > Hey Marcin, > > On 08/24/2012 12:52 PM, Marcin Junczys-Dowmunt wrote: >> You are aligning unsegmented words to segmented phoneme sequences, do I >> understand that correctly? Maybe it's worth to use lettabout:startpageer >> sequences >> instead of words and replace spaces with special characters. >> Like this: >> >> t h i s _ i s _ a n _ e x a m p l e . > I'm not quite sure what you mean by unsegmented words? But my Data tends > to look something like this (of course more complex and in czech, but > i'll write something by hand: > > hello world, this is a test. > h e l o w oe r l d th i s i z a t ae z t > alignment is cool > ae l a i ng m e n t i z k u l > > and so on. Now i'm trying to to find alignments like: > hello=h e l o > world=w oe r l d > this=th i s > is=i z > a=a > test=t ae z t > alignment=ae l a i ng m e n t > is=i z > cool=k u l > > But i believe that what you suggest is already done by the > postprocessing tool i use (PISA), only in a much more sophisticated > manner. The only problem is non-monotonity for me, really ;(. But thanks > for your suggestion, i think i'll try it out nonetheless (i tend to > clutch at every straw at this stage ;P). > > Thanks and Best Regards > - Dario > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > > _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
