[Moses-support] XML syntactic input: inconsistent span notation

Rico Sennrich Wed, 06 Oct 2010 01:57:13 -0700

Hi all,

I'm doing some experiments with syntactic translation models and trained a
system with sentences of the following format:
<tree label="s" span="0-3"/> <tree label="art" span="0-0"/> <tree label="nn" 
span="1-1"/> <tree label="np" span="0-3"/> <tree label="appr" span="2-2"/> 
<tree 
label="pp" span="2-3"/> <tree label="nn" span="3-3"/> ein tag in üschenen


Training seems to work fine, but during decoding, I get this error message: 
ERROR: tag tree must span at least one word

I had a look at the relevant bit of code in
scripts/training/phrase-extract/XmlTree.cpp, line 353
moses/src/TreeInput.cpp, line 159
moses/src/XmlOption.cpp, line 282

Turns out that XmlTree.cpp uses '-' for tokenization of the span parameter,
while TreeInput.cpp and XmlOption.cpp both use ','.

I think standardizing the token delimiter would make sense, but I'm afraid
simply replacing one with the other may break other stuff. Maybe one could allow
both?

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

[Moses-support] XML syntactic input: inconsistent span notation

Reply via email to