Re: [Moses-support] XML syntactic input: inconsistent span notation

Hieu Hoang Wed, 06 Oct 2010 03:41:50 -0700

  hi rico

I agree with you that it should be standardised. I don't use this 
notation so I don't mind how it's done, so unless somewhere else does, 
you're free to standardise it how you you wish.


On 06/10/2010 09:55, Rico Sennrich wrote:
> Hi all,
>
> I'm doing some experiments with syntactic translation models and trained a
> system with sentences of the following format:
> <tree label="s" span="0-3"/>  <tree label="art" span="0-0"/>  <tree label="nn"
> span="1-1"/>  <tree label="np" span="0-3"/>  <tree label="appr" span="2-2"/>  
> <tree
> label="pp" span="2-3"/>  <tree label="nn" span="3-3"/>  ein tag in üschenen
>
> Training seems to work fine, but during decoding, I get this error message:
> ERROR: tag tree must span at least one word
>
> I had a look at the relevant bit of code in
> scripts/training/phrase-extract/XmlTree.cpp, line 353
> moses/src/TreeInput.cpp, line 159
> moses/src/XmlOption.cpp, line 282
>
> Turns out that XmlTree.cpp uses '-' for tokenization of the span parameter,
> while TreeInput.cpp and XmlOption.cpp both use ','.
>
> I think standardizing the token delimiter would make sense, but I'm afraid
> simply replacing one with the other may break other stuff. Maybe one could 
> allow
> both?
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] XML syntactic input: inconsistent span notation

Reply via email to