hi rico I agree with you that it should be standardised. I don't use this notation so I don't mind how it's done, so unless somewhere else does, you're free to standardise it how you you wish.
On 06/10/2010 09:55, Rico Sennrich wrote: > Hi all, > > I'm doing some experiments with syntactic translation models and trained a > system with sentences of the following format: > <tree label="s" span="0-3"/> <tree label="art" span="0-0"/> <tree label="nn" > span="1-1"/> <tree label="np" span="0-3"/> <tree label="appr" span="2-2"/> > <tree > label="pp" span="2-3"/> <tree label="nn" span="3-3"/> ein tag in üschenen > > Training seems to work fine, but during decoding, I get this error message: > ERROR: tag tree must span at least one word > > I had a look at the relevant bit of code in > scripts/training/phrase-extract/XmlTree.cpp, line 353 > moses/src/TreeInput.cpp, line 159 > moses/src/XmlOption.cpp, line 282 > > Turns out that XmlTree.cpp uses '-' for tokenization of the span parameter, > while TreeInput.cpp and XmlOption.cpp both use ','. > > I think standardizing the token delimiter would make sense, but I'm afraid > simply replacing one with the other may break other stuff. Maybe one could > allow > both? > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
