The reality is that the current --xml-input functionality straddles the
fence between the scheme-less and defined schema worlds. It's "<anytag/>
except <wall/> and <zone/> and <ne/>." Moses currently supports only
four functions with XML markup: specifying alternate translation, walls,
zones and named entities. I'm not sure a full XML parser is necessary
for four functions, but the chance of accidental conflicts grows with
the number of functions.
It seems more efficient to assign a tag name to the only current
function that doesn't have a reserved tag name. Then, the undefined tag
names become the exception that Moses ignores.
Tom
On 10/16/2013 11:16 PM, Achim Ruopp wrote:
<anytag/> is XML-compliant in schema-less XML (as long as the tag
name complies to http://www.w3.org/TR/REC-xml/#NT-Name)
IMHO Moses input (with the -xml-input option) should stay schema-less,
or we should define a schema. Right now I can't see a pressing reason
to define a schema.
In any case it would be good to parse the input (with the -xml-input
option) with a proper XML parser, e.g.
http://www.boost.org/doc/libs/1_54_0/doc/html/boost_propertytree/parsers.html#boost_propertytree.parsers.xml_parser
There are probably better XML parsers, but Moses already requires
Boost. Using an XML parser could also solve some of the character
escaping uncertainty.
Achim
*From:*[email protected]
[mailto:[email protected]] *On Behalf Of
*[email protected]
*Sent:* Tuesday, October 15, 2013 10:25 PM
*To:* [email protected]
*Subject:* Re: [Moses-support] Placeholders
A change from <anytag/> will no-doubt disrupt existing pipelines.
Communicating the change with the new release will be a great help.
On 2013-10-15 01:35, Hieu Hoang wrote:
they're good ideas. I'll have a think if I get round to doing it.
Would also want to minimise the work I have to do, and minimize
the disruption to people's existing pipeline.
On 15 October 2013 01:33, Tom Hoar
<[email protected]
<mailto:[email protected]>> wrote:
I agree that <anytag/> could cause problems, especially with the
growing
list of reserved tag names (ne, wall, zone). I wholeheartedly
support a
fixed tag, but I'm not sure "option" is it. What about <np/>
(already in
the manual) or <xml-markup/> or <xml-input/> or <moses/>?
Here's another idea. The -xml-input flag supports values "exclusive,"
"inclusive," "ignore" and "pass-through." What about changing the flag
to a boolean flag. Then, use the value as the xml tags: <exclusive/>,
<inclusive/> and <ignore/> so the one invocation of Moses would
support
all modes on a per-sentence basis. Just a thought. Think this
would also
be easier if you dropped the "pass-through" option because no need for
backwards compatibility.
Another idea, although slightly different subject. Moses'
-monotone-at-punctuation flag would be more useful if we could
define/override the punctuation & symbols that we want it to use. Not
sure how to best accomplish this.
Tom
On 10/15/2013 04:07 AM, Hieu Hoang wrote:
> In fact, we're thinking of changing <anytag/> to something
fixed, like
> <option/>
>
> The <anytag/> behaviour isn't good XML and will cause problems
in the
> future
>
> Any opinions on this gratefully received
>
_______________________________________________
Moses-support mailing list
[email protected] <mailto:[email protected]>
http://mailman.mit.edu/mailman/listinfo/moses-support
--
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu
_______________________________________________
Moses-support mailing list
[email protected] <mailto:[email protected]>
http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support