<anytag/> is XML-compliant in schema-less XML (as long as the tag name complies to http://www.w3.org/TR/REC-xml/#NT-Name)
IMHO Moses input (with the -xml-input option) should stay schema-less, or we should define a schema. Right now I can't see a pressing reason to define a schema. In any case it would be good to parse the input (with the -xml-input option) with a proper XML parser, e.g. http://www.boost.org/doc/libs/1_54_0/doc/html/boost_propertytree/parsers.html#boost_propertytree.parsers.xml_parser There are probably better XML parsers, but Moses already requires Boost. Using an XML parser could also solve some of the character escaping uncertainty. Achim From: [email protected] [mailto:[email protected]] On Behalf Of [email protected] Sent: Tuesday, October 15, 2013 10:25 PM To: [email protected] Subject: Re: [Moses-support] Placeholders A change from <anytag/> will no-doubt disrupt existing pipelines. Communicating the change with the new release will be a great help. On 2013-10-15 01:35, Hieu Hoang wrote: they're good ideas. I'll have a think if I get round to doing it. Would also want to minimise the work I have to do, and minimize the disruption to people's existing pipeline. On 15 October 2013 01:33, Tom Hoar <[email protected]> wrote: I agree that <anytag/> could cause problems, especially with the growing list of reserved tag names (ne, wall, zone). I wholeheartedly support a fixed tag, but I'm not sure "option" is it. What about <np/> (already in the manual) or <xml-markup/> or <xml-input/> or <moses/>? Here's another idea. The -xml-input flag supports values "exclusive," "inclusive," "ignore" and "pass-through." What about changing the flag to a boolean flag. Then, use the value as the xml tags: <exclusive/>, <inclusive/> and <ignore/> so the one invocation of Moses would support all modes on a per-sentence basis. Just a thought. Think this would also be easier if you dropped the "pass-through" option because no need for backwards compatibility. Another idea, although slightly different subject. Moses' -monotone-at-punctuation flag would be more useful if we could define/override the punctuation & symbols that we want it to use. Not sure how to best accomplish this. Tom On 10/15/2013 04:07 AM, Hieu Hoang wrote: > In fact, we're thinking of changing <anytag/> to something fixed, like > <option/> > > The <anytag/> behaviour isn't good XML and will cause problems in the > future > > Any opinions on this gratefully received > _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support -- Hieu Hoang Research Associate University of Edinburgh http://www.hoang.co.uk/hieu _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
