Hi all, I'm doing some experiments with syntactic translation models and trained a system with sentences of the following format: <tree label="s" span="0-3"/> <tree label="art" span="0-0"/> <tree label="nn" span="1-1"/> <tree label="np" span="0-3"/> <tree label="appr" span="2-2"/> <tree label="pp" span="2-3"/> <tree label="nn" span="3-3"/> ein tag in üschenen
Training seems to work fine, but during decoding, I get this error message: ERROR: tag tree must span at least one word I had a look at the relevant bit of code in scripts/training/phrase-extract/XmlTree.cpp, line 353 moses/src/TreeInput.cpp, line 159 moses/src/XmlOption.cpp, line 282 Turns out that XmlTree.cpp uses '-' for tokenization of the span parameter, while TreeInput.cpp and XmlOption.cpp both use ','. I think standardizing the token delimiter would make sense, but I'm afraid simply replacing one with the other may break other stuff. Maybe one could allow both? _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
