hey arda

fyi, below

-------- Original Message --------
Subject:        Re: R: R: Alignment information in phrase table?
Date:   Sun, 18 Jul 2010 21:08:26 +0100
From:   Hieu Hoang <[email protected]>
To:     Christian Hardmeier <[email protected]>
CC: [email protected] <[email protected]>, Yu Chen <[email protected]>, Philipp Koehn <[email protected]>, Andreas Eisele <[email protected]>, Nicola Bertoldi <[email protected]>, Philip Williams <[email protected]>, [email protected], [email protected]



hi guys

christian&  i were talking @ acl&  thought it would be a good idea to
put the alignment info back into the phrase table. This time, we've
thought a little about it and try&  support it so it doesn't fall out again.

when you run the ./score part of the phrase extraction, add the argument
   --WordAlignment
and it'll copy the alignment info from the giza++ files to the phrase table.

The format of the final phrase table has changed a bit, so apologies if
that messes up your scripts. The old format was meant to do some othing
things, we didn't think about memory consumption or speed. This new
format is simpler&  should be ok, and is also the same as the
moses-chart decoding format so you can swap in the hiero/syntax stuff
with little effort.

It's now:
   source ||| target ||| alignment ||| scores ||| counts
eg.
   Mushariff letzer Act ? ||| Mushariff 's last act ? ||| 0-0 0-1 1-2
2-3 3-4 ||| 0.5 0.414 0.2343 0.2354 2.718 ||| 14 12

let's see what happens. we should let the wider audience know once you
guys kick the tires on it.

On 09/07/2010 13:47, Christian Hardmeier wrote:
 You're right about that thread, strange. I wonder if this wasn't fixed at some 
point. I hope it was, at least, because I never noticed anything like that. :-) 
Anyhow I'm sure that memscore doesn't do this.

 I regularly use the word alignments, so if there's a serious problem, I hope 
I'll notice, and I will need to fix it, I suppose... Would be cool if you could 
add it back in!

 I'm coming to ACL, so we can talk about details next week.

 Thanks,
 Christian

 ________________________________________
 Da: Hieu Hoang [[email protected]]
 Inviato: venerdì 9 luglio 2010 14.28
 A: Christian Hardmeier
 Cc: [email protected]; Yu Chen; Philipp Koehn; Andreas Eisele; Nicola Bertoldi
 Oggetto: Re: R: Alignment information in phrase table?

 i think you have a point, it's a popular&   useful feature that should be
 put back in.

 btw, i found a thread about differences in training from a 2 yrs ago:
 http://article.gmane.org/gmane.comp.nlp.moses.user/1267
 i'm not sure if the problem still exist, i was surprised by it as well,
 but i'm reticent to do anything that reduces performance.

 i think adding it into the training as an option would be easy and i can
 add it in. However, i don't use it so any problems would slip under my
 radar. You guys want to look after it once it's in?

 Are you coming to ACL? can talk about it then, or by skype afterwards if
 we want to go ahead with it.

 On 09/07/2010 10:03, Christian Hardmeier wrote:

 Hi Hieu

 I don't have much to say about word alignments in the decoder - since I've 
found out that it's quite easy to obtain word alignments by putting the 
alignment info in a second factor in the phrase table, I don't need special 
code in the decoder to deal with this.

 However, in my opinion removing the alignments from the training scripts was a 
serious mistake. At the very least, they should be made optional. Why do you 
want to remove working functionality that many people want to use (witness the 
frequent requests for this feature on the mailing list) just because it may 
produce slightly inferior probability estimates in a few cases? I'm completely 
dependent on the word alignments for my current work, and if they're not output 
any more, this means I can't upgrade to the latest trunk, which is a hassle.

 By the way, I don't even think the problem with conflicting alignments really 
exists. I'm sure you don't get duplicate entries in the phrase table if you use 
memscore, and I would be rather surprised to find out you do with the classical 
training code. In fact, this problem is discussed in my paper for the last MT 
Marathon:http://www.mt-archive.info/MTMarathon-2010-Hardmeier.pdf
 The second paragraph of section 2.2 tells you what memscore does when faced 
with conflicting alignments:
 "When a phrase pair occurs with different alignments in the input, the most 
frequent alignment is output. Ties are broken arbitrarily."
 I don't remember exactly what Philipp's scripts do, but I believe it's 
something similar.
 The last paragraph of section 2.1 contains a discussion about the computation 
of lexical weight scores in the presence of conflicting alignments. Here, 
memscore behaves slightly differently from Philipp's scripts, but neither of 
them outputs duplicate entries.

 Couldn't you ask whoever removed word alignments from the training to roll 
back this change please? If they absolutely don't want to have the alignments 
for whatever reason, they should add a switch, but not just delete code some 
people are using.

 Cheers,
 Christian

 ________________________________________
 Da: Hieu Hoang [[email protected]]
 Inviato: giovedì 8 luglio 2010 13.05
 A:[email protected]
 Cc: Yu Chen; Philipp Koehn; Christian Hardmeier; Andreas Eisele; Nicola 
Bertoldi
 Oggetto: Re: Alignment information in phrase table?

 Hi Tracey

 there were problems with memory consumption&    slowness in the decoder.
 Josh tried to contain that about 2 yrs ago by only loading alignment
 when it was need.
 
http://mosesdecoder.svn.sourceforge.net/viewvc/mosesdecoder?view=revision&sortby=file&revision=1941
 however, the implementation was still unecessarily memory hungry&    slow.

 We also noticed that there were small differences in the training
 routine with the aligment info. I can't remember the details&    i can't
 find the emails, but it goes something like:
       If there is 2 phrase pairs in the training corpus that have exactly
 the same source&    target, but only differ in the alignment, eg
            a b ||| A B ||| 0-0 1-1
            a b ||| A B ||| 0-0 0-1 1-1
       then the training routines will create 2 entries in the phrase table.
 This make decoding slightly worse so we rolled it back too.

 the main problem with it was that nicola&    i had a hand in it but
 neither of us really looked after the code. When bugs were found, it was
 easier to rollback than fix the problem.

 the current decoder once again carry alignment info which is used to
 store the co-index for the hiero/syntax system, but it can store word
 level alignement too. It's built into the new on-disk pt format, and
 isn't too memory hungry.

 i think it'll be nice to have the alignment info back in. But someone
 has to take charge and be prepared to fix it if bugs gets found

 On 08/07/2010 10:57, Yu Chen wrote:


 Dear Philipp and Hieu,

 I just noticed the model training script in moses no longer output the
 best alignment information for non-hierarchical phrase pairs in the
 phrase table. (line 447 in
 $MOSESTRUNK/scripts/training/phrase-extract/score.cpp) Besides, the
 options for the decoder to print out the word alignment information
 have been disabled for a while.
 (http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc6)

 Is there a particular reason doing so? I figured it would be better to
 ask you before send this question to the mailling list. This function
 is fairly essential in our hybrid setup. Although we still have the
 previous version, it would be problematic for us to try out new
 features in moses. Looking forward to your answers! :)

 Best regards,
 Yu








_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to