Huh.  Apparently the lexicalized reordering model can condition on input
with the format 0-0 or in general something like 0,3-1 where there's a
comma separated list of source factors to consider, a dash, and a comma
separated list of target factors to consider.  This will almost
certainly crash when used with PhraseDictionaryMemory.  Is there an easy
way to get the proper source phrase into Hypothesis without depending on
the target phrase?

Also, for the on-disk format where the source phrase is backed by a
cache, are we sure that the source phrase isn't evicted from the cache
before it's used?

Ulterior motive for this: deduplicate target phrase objects, in which
case they won't be able to point to source phrases at all.

Kenneth

On 10/15/2011 02:14 PM, Barry Haddow wrote:
> I guess it doesn't then. I'd have to check. Lexicalised reordering is a
> stateful feature function, so it can access the source phrase via the
> TranslationOption.
> 
> Cheers - Barry
> 
> Sent from my ZX81
> 
> ----- Reply message -----
> From: "Kenneth Heafield" <[email protected]>
> Date: Sat, Oct 15, 2011 11:37
> Subject: [Moses-support] Source phrase
> To: "Barry Haddow" <[email protected]>, <[email protected]>
> Cc: "Hieu Hoang" <[email protected]>
> 
> 
> Doors the reordering model actually use the source phrase? Wouldn't it
> similarly segfault if it did?
> 
> Barry Haddow <[email protected]> wrote:
> 
> Hi Folks
> 
> This bug was discovered here:
> 
> http://www.mail-archive.com/[email protected]/msg03496.html
> 
> but for some reason a fix never made it into trunk. I think I "fixed" it
> in the
> sparse features branch by storing a Phrase object instead of a pointer, but
> obviously that uses extra memory.
> 
> cheers - Barry
> 
> 
> On Saturday 15 Oct 2011 08:58:15 Hieu Hoang wrote:
>> On 15/10/2011 13:41, Kenneth Heafield wrote:
>> > The bug is this line in PhraseDictionaryMemory:
>> >
>> > targetPhrase.SetSourcePhrase(&sourcePhrase);
>> >
>> > That calls this piece of code including a comment that seems to indicate
>> > awareness of the problem:
>> >
>> > //! TODO - why is this needed and is it set correctly by every phrase
>> > dictionary class ? should be set in constructor
>> > void SetSourcePhrase(Phrase const* p) {
>> > m_sourcePhrase=p;
>> > }
>> >
>> > So the pointer to sourcePhrase is stored in TargetPhrase. But
>> > sourcePhrase is a stack-allocated object in
>> > PhraseDictionaryMemory::Load. It has the same address every iteration
>> > and goes out of scope at the end of the function. So every TargetPhrase
>> > object is storing the same dangling pointer.
>>
>> It's the same sourcephrase for sentence input, but different for
>> lattices and confusion networds. The in-memory phrase table doesn't
>> support lattices or confusion net so it's prob not strictly necessary
>> there, but it is in the binary pt implementation. It's a hack dating
>> back to JHU 2006 which never got cleaned up
>>
>> > Furthermore,
>> > PhraseDictionaryMemory does not explicitly store every source Phrase
>> > object (and it shouldn't have to), so there's nothing for these to point
>> > to.
>> >
>> > It seems the correct solution is to remove the pointer from TargetPhrase
>> > and properly track where things came from. Given how deeply embedded
>> > this is into code that isn't mine, I'm not offering to fix it.
>>
>> yeah, prob best to leave it unless it's broken
>>
>> > Kenneth
>> >
>> > On 10/15/11 05:03, Hieu Hoang wrote:
>> >> yep, the source phrase is carried as part of the translation rule
> *only*
>> >> to calculate lexicalized reordering score.
>> >>
>> >> If I can remember correctly, it's implemented slightly differently
>> >> depending on the whether the input is a sentence, confusion net or
>> >> lattice. It's optimised since you can easily get the source phrase from
>> >> a sentence but the source has to be calculated as you go along in the
>> >> other 2 types.
>> >>
>> >> I would prefer it if the implementation was consistent, even at a cost
>> >> of slightly degraded speed.
>> >>
>> >> I've added a regression test
>> >> phrase.lexicalized-reordering-bin
>> >> just in case you decide to play about with it
>> >>
>> >> I don't see the bug, wadda you want me to fix?
>> >>
>> >> On 15/10/2011 03:17, Kenneth Heafield wrote:
>> >>> Actually, the incorrect source phrase is used for more than logging.
>> >>> It makes it all the way to lexical reordering:
>> >>>
>> >>> const Phrase *sourcePhrase = transOpt.GetSourcePhrase();
>> >>> if (sourcePhrase) {
>> >>> Scores score = lexreordering.GetProb(*sourcePhrase,
>> >>> transOpt.GetTargetPhrase());
>> >>> if (!score.empty())
>> >>> transOpt.CacheScores(lexreordering, score);
>> >>>
>> >>> These are parts of Moses far afield. Hieu, the bug goes back to you in
>> >>> 1338. Want to fix it?
>> >>>
>> >>> On 10/14/11 20:48, Kenneth Heafield wrote:
>> >>>> Hi,
>> >>>>
>> >>>> Apparently target phrases keep track of their source phrases? This
>> >>>> is incorrect with e.g. PhraseTableMemory (this is the code from
>> >>>> revision 4364):
>> >>>>
>> >>>> Phrase sourcePhrase(Input, 0);
>> >>>> sourcePhrase.CreateFromString( input, phraseVector);
>> >>>> //target
>> >>>> TargetPhrase targetPhrase(Output);
>> >>>> targetPhrase.SetSourcePhrase(&sourcePhrase);
>> >>>>
>> >>>> I'm currently refactoring it, but have provided the old version to
>> >>>> make it clear this isn't my fault.
>> >>>>
>> >>>> The last line stores a pointer to a stack allocated object in
>> >>>> targetPhrase. Furthermore, the sourcePhrase object does not even
>> >>>> exist in copy form, so there's nothing correct for it to point to.
>> >>>>
>> >>>> I tried to remove it, apparently it makes its way to
> TranslationOption
>> >>>> and Hypothesis, GetSourcePhraseStringRep in particular. That is
>> >>>> called only from Manager.cpp and used for logging purposes in e.g.:
>> >>>>
>> >>>> outputWordGraphStream<< "\tw="<<
>> >>>> hypo->GetSourcePhraseStringRep()<< "|"<<
>> >>>> hypo->GetCurrTargetPhrase();
>> >>>>
>> >>>> On the basis that no logging is better than incorrect logging, I'm
>> >>>> going to stop tracking the source phrase in the target phrase.
>> >>>>
>> >>>> Kenneth
>> >>>>_____________________________________________
> 
>> >>>> Moses-support mailing list
>> >>>> [email protected]
>> >>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>> >>>
>> >>>__________________
> 
> 
> 
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to