I guess it doesn't then. I'd have to check. Lexicalised reordering is a
stateful feature function, so it can access the source phrase via the
TranslationOption.
Cheers - Barry
Sent from my ZX81
----- Reply message -----
From: "Kenneth Heafield" <[email protected]>
Date: Sat, Oct 15, 2011 11:37
Subject: [Moses-support] Source phrase
To: "Barry Haddow" <[email protected]>, <[email protected]>
Cc: "Hieu Hoang" <[email protected]>
Doors the reordering model actually use the source phrase? Wouldn't it
similarly segfault if it did?
Barry Haddow <[email protected]> wrote:
Hi Folks
This bug was discovered here:
http://www.mail-archive.com/[email protected]/msg03496.html
but for some reason a fix never made it into trunk. I think I "fixed" it in the
sparse features branch by storing a Phrase object instead of a pointer, but
obviously that uses extra memory.
cheers - Barry
On Saturday 15 Oct 2011 08:58:15 Hieu Hoang wrote:
> On 15/10/2011 13:41, Kenneth Heafield wrote:
> > The bug is this line in PhraseDictionaryMemory:
> >
> > targetPhrase.SetSourcePhrase(&sourcePhrase);
> >
> > That calls this piece of code including a comment that seems to indicate
> > awareness of the problem:
> >
> > //! TODO - why is this needed and is it set correctly by every phrase
> > dictionary class ? should be set in constructor
> > void SetSourcePhrase(Phrase const* p) {
> > m_sourcePhrase=p;
> > }
> >
> > So the pointer to sourcePhrase is stored in TargetPhrase. But
> > sourcePhrase is a stack-allocated object in
> > PhraseDictionaryMemory::Load. It has the same address every iteration
> > and goes out of scope at the end of the function. So every TargetPhrase
> > object is storing the same dangling pointer.
>
> It's the same sourcephrase for sentence input, but different for
> lattices and confusion networds. The in-memory phrase table doesn't
> support lattices or confusion net so it's prob not strictly necessary
> there, but it is in the binary pt implementation. It's a hack dating
> back to JHU 2006 which never got cleaned up
>
> > Furthermore,
> > PhraseDictionaryMemory does not explicitly store every source Phrase
> > object (and it shouldn't have to), so there's nothing for these to point
> > to.
> >
> > It seems the correct solution is to remove the pointer from TargetPhrase
> > and properly track where things came from. Given how deeply embedded
> > this is into code that isn't mine, I'm not offering to fix it.
>
> yeah, prob best to leave it unless it's broken
>
> > Kenneth
> >
> > On 10/15/11 05:03, Hieu Hoang wrote:
> >> yep, the source phrase is carried as part of the translation rule *only*
> >> to calculate lexicalized reordering score.
> >>
> >> If I can remember correctly, it's implemented slightly differently
> >> depending on the whether the input is a sentence, confusion net or
> >> lattice. It's optimised since you can easily get the source phrase from
> >> a sentence but the source has to be calculated as you go along in the
> >> other 2 types.
> >>
> >> I would prefer it if the implementation was consistent, even at a cost
> >> of slightly degraded speed.
> >>
> >> I've added a regression test
> >> phrase.lexicalized-reordering-bin
> >> just in case you decide to play about with it
> >>
> >> I don't see the bug, wadda you want me to fix?
> >>
> >> On 15/10/2011 03:17, Kenneth Heafield wrote:
> >>> Actually, the incorrect source phrase is used for more than logging.
> >>> It makes it all the way to lexical reordering:
> >>>
> >>> const Phrase *sourcePhrase = transOpt.GetSourcePhrase();
> >>> if (sourcePhrase) {
> >>> Scores score = lexreordering.GetProb(*sourcePhrase,
> >>> transOpt.GetTargetPhrase());
> >>> if (!score.empty())
> >>> transOpt.CacheScores(lexreordering, score);
> >>>
> >>> These are parts of Moses far afield. Hieu, the bug goes back to you in
> >>> 1338. Want to fix it?
> >>>
> >>> On 10/14/11 20:48, Kenneth Heafield wrote:
> >>>> Hi,
> >>>>
> >>>> Apparently target phrases keep track of their source phrases? This
> >>>> is incorrect with e.g. PhraseTableMemory (this is the code from
> >>>> revision 4364):
> >>>>
> >>>> Phrase sourcePhrase(Input, 0);
> >>>> sourcePhrase.CreateFromString( input, phraseVector);
> >>>> //target
> >>>> TargetPhrase targetPhrase(Output);
> >>>> targetPhrase.SetSourcePhrase(&sourcePhrase);
> >>>>
> >>>> I'm currently refactoring it, but have provided the old version to
> >>>> make it clear this isn't my fault.
> >>>>
> >>>> The last line stores a pointer to a stack allocated object in
> >>>> targetPhrase. Furthermore, the sourcePhrase object does not even
> >>>> exist in copy form, so there's nothing correct for it to point to.
> >>>>
> >>>> I tried to remove it, apparently it makes its way to TranslationOption
> >>>> and Hypothesis, GetSourcePhraseStringRep in particular. That is
> >>>> called only from Manager.cpp and used for logging purposes in e.g.:
> >>>>
> >>>> outputWordGraphStream<< "\tw="<<
> >>>> hypo->GetSourcePhraseStringRep()<< "|"<<
> >>>> hypo->GetCurrTargetPhrase();
> >>>>
> >>>> On the basis that no logging is better than incorrect logging, I'm
> >>>> going to stop tracking the source phrase in the target phrase.
> >>>>
> >>>> Kenneth
> >>>>_____________________________________________
> >>>> Moses-support mailing list
> >>>> [email protected]
> >>>> http://mailman.mit.edu/mailman/listinfo/moses-support
> >>>
> >>>__________________The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support