Hi Kenneth
I looked at the lex reordering code you pointed at earlier, and it gets the
source phrase from the TranslationOption. I think this copy is ok, as (for
sentence input) it doesn't come from the TargetPhrase, it is copied from the
Sentence:
if (inputType.GetType() == SentenceInput) {
Phrase phrase = inputType.GetSubString(wordsRange);
m_sourcePhrase = new Phrase(phrase);
}
So I don't think the source phrase member of TargetPhrase is actually used
anywhere - just as well since it's a dangling pointer. However if you
implement a new stateless feature and want to access the source phrase then
you're stuck. Ideally, both stateful and stateless feature functions would
take a TranslationOption in Evaluate(), but that would be a disruptive
change.
cheers - Barry
> Also, for the on-disk format where the source phrase is backed by a
> cache, are we sure that the source phrase isn't evicted from the cache
> before it's used?
On Saturday 15 Oct 2011 21:50:45 Kenneth Heafield wrote:
> Huh. Apparently the lexicalized reordering model can condition on input
> with the format 0-0 or in general something like 0,3-1 where there's a
> comma separated list of source factors to consider, a dash, and a comma
> separated list of target factors to consider. This will almost
> certainly crash when used with PhraseDictionaryMemory. Is there an easy
> way to get the proper source phrase into Hypothesis without depending on
> the target phrase?
>
> Also, for the on-disk format where the source phrase is backed by a
> cache, are we sure that the source phrase isn't evicted from the cache
> before it's used?
>
> Ulterior motive for this: deduplicate target phrase objects, in which
> case they won't be able to point to source phrases at all.
>
> Kenneth
>
> On 10/15/2011 02:14 PM, Barry Haddow wrote:
> > I guess it doesn't then. I'd have to check. Lexicalised reordering is a
> > stateful feature function, so it can access the source phrase via the
> > TranslationOption.
> >
> > Cheers - Barry
> >
> > Sent from my ZX81
> >
> > ----- Reply message -----
> > From: "Kenneth Heafield" <[email protected]>
> > Date: Sat, Oct 15, 2011 11:37
> > Subject: [Moses-support] Source phrase
> > To: "Barry Haddow" <[email protected]>, <[email protected]>
> > Cc: "Hieu Hoang" <[email protected]>
> >
> >
> > Doors the reordering model actually use the source phrase? Wouldn't it
> > similarly segfault if it did?
> >
> > Barry Haddow <[email protected]> wrote:
> >
> > Hi Folks
> >
> > This bug was discovered here:
> >
> > http://www.mail-archive.com/[email protected]/msg03496.html
> >
> > but for some reason a fix never made it into trunk. I think I "fixed" it
> > in the
> > sparse features branch by storing a Phrase object instead of a pointer,
> > but obviously that uses extra memory.
> >
> > cheers - Barry
> >
> > On Saturday 15 Oct 2011 08:58:15 Hieu Hoang wrote:
> >> On 15/10/2011 13:41, Kenneth Heafield wrote:
> >> > The bug is this line in PhraseDictionaryMemory:
> >> >
> >> > targetPhrase.SetSourcePhrase(&sourcePhrase);
> >> >
> >> > That calls this piece of code including a comment that seems to
> >> > indicate awareness of the problem:
> >> >
> >> > //! TODO - why is this needed and is it set correctly by every phrase
> >> > dictionary class ? should be set in constructor
> >> > void SetSourcePhrase(Phrase const* p) {
> >> > m_sourcePhrase=p;
> >> > }
> >> >
> >> > So the pointer to sourcePhrase is stored in TargetPhrase. But
> >> > sourcePhrase is a stack-allocated object in
> >> > PhraseDictionaryMemory::Load. It has the same address every iteration
> >> > and goes out of scope at the end of the function. So every
> >> > TargetPhrase object is storing the same dangling pointer.
> >>
> >> It's the same sourcephrase for sentence input, but different for
> >> lattices and confusion networds. The in-memory phrase table doesn't
> >> support lattices or confusion net so it's prob not strictly necessary
> >> there, but it is in the binary pt implementation. It's a hack dating
> >> back to JHU 2006 which never got cleaned up
> >>
> >> > Furthermore,
> >> > PhraseDictionaryMemory does not explicitly store every source Phrase
> >> > object (and it shouldn't have to), so there's nothing for these to
> >> > point to.
> >> >
> >> > It seems the correct solution is to remove the pointer from
> >> > TargetPhrase and properly track where things came from. Given how
> >> > deeply embedded this is into code that isn't mine, I'm not offering to
> >> > fix it.
> >>
> >> yeah, prob best to leave it unless it's broken
> >>
> >> > Kenneth
> >> >
> >> > On 10/15/11 05:03, Hieu Hoang wrote:
> >> >> yep, the source phrase is carried as part of the translation rule
> >
> > *only*
> >
> >> >> to calculate lexicalized reordering score.
> >> >>
> >> >> If I can remember correctly, it's implemented slightly differently
> >> >> depending on the whether the input is a sentence, confusion net or
> >> >> lattice. It's optimised since you can easily get the source phrase
> >> >> from a sentence but the source has to be calculated as you go along
> >> >> in the other 2 types.
> >> >>
> >> >> I would prefer it if the implementation was consistent, even at a
> >> >> cost of slightly degraded speed.
> >> >>
> >> >> I've added a regression test
> >> >> phrase.lexicalized-reordering-bin
> >> >> just in case you decide to play about with it
> >> >>
> >> >> I don't see the bug, wadda you want me to fix?
> >> >>
> >> >> On 15/10/2011 03:17, Kenneth Heafield wrote:
> >> >>> Actually, the incorrect source phrase is used for more than logging.
> >> >>> It makes it all the way to lexical reordering:
> >> >>>
> >> >>> const Phrase *sourcePhrase = transOpt.GetSourcePhrase();
> >> >>> if (sourcePhrase) {
> >> >>> Scores score = lexreordering.GetProb(*sourcePhrase,
> >> >>> transOpt.GetTargetPhrase());
> >> >>> if (!score.empty())
> >> >>> transOpt.CacheScores(lexreordering, score);
> >> >>>
> >> >>> These are parts of Moses far afield. Hieu, the bug goes back to you
> >> >>> in 1338. Want to fix it?
> >> >>>
> >> >>> On 10/14/11 20:48, Kenneth Heafield wrote:
> >> >>>> Hi,
> >> >>>>
> >> >>>> Apparently target phrases keep track of their source phrases? This
> >> >>>> is incorrect with e.g. PhraseTableMemory (this is the code from
> >> >>>> revision 4364):
> >> >>>>
> >> >>>> Phrase sourcePhrase(Input, 0);
> >> >>>> sourcePhrase.CreateFromString( input, phraseVector);
> >> >>>> //target
> >> >>>> TargetPhrase targetPhrase(Output);
> >> >>>> targetPhrase.SetSourcePhrase(&sourcePhrase);
> >> >>>>
> >> >>>> I'm currently refactoring it, but have provided the old version to
> >> >>>> make it clear this isn't my fault.
> >> >>>>
> >> >>>> The last line stores a pointer to a stack allocated object in
> >> >>>> targetPhrase. Furthermore, the sourcePhrase object does not even
> >> >>>> exist in copy form, so there's nothing correct for it to point to.
> >> >>>>
> >> >>>> I tried to remove it, apparently it makes its way to
> >
> > TranslationOption
> >
> >> >>>> and Hypothesis, GetSourcePhraseStringRep in particular. That is
> >> >>>> called only from Manager.cpp and used for logging purposes in e.g.:
> >> >>>>
> >> >>>> outputWordGraphStream<< "\tw="<<
> >> >>>> hypo->GetSourcePhraseStringRep()<< "|"<<
> >> >>>> hypo->GetCurrTargetPhrase();
> >> >>>>
> >> >>>> On the basis that no logging is better than incorrect logging, I'm
> >> >>>> going to stop tracking the source phrase in the target phrase.
> >> >>>>
> >> >>>> Kenneth
> >> >>>>_____________________________________________
> >> >>>>
> >> >>>> Moses-support mailing list
> >> >>>> [email protected]
> >> >>>> http://mailman.mit.edu/mailman/listinfo/moses-support
> >> >>>
> >> >>>__________________
> >
> > The University of Edinburgh is a charitable body, registered in
> > Scotland, with registration number SC005336.
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support