On 24 December 2013 15:34, Mikel Forcada <m...@dlsi.ua.es> wrote:
> Hi all,
>
> As part of my work with students in the Google Code-In (notably
> galaxyfeeder) I have found a limitation in the current design of
> Apertium, as regards handling of format tags (encapsulated as
> superblanks) in Apertium.
>
> I would appreciate it very much has time to turn this message into a
> proper bug report, although, as will be seen, rather than a bug, it is a
> design limitation.
>
> Since transfer rules (.t1x, .t2x) have to move superblanks around
> explicitly, it may be the case that valid HTML or XML is rendered
> invalid. For instance, a translated ODT file may not open, or a
> translated XHTML page may not be valid.
>

This is a known issue (e.g., Jacob mentions it in this thread from
2009: 
http://sourceforge.net/mailarchive/forum.php?thread_name=20cf28cd0904300204v45f35e51i118f4d146f83748%40mail.gmail.com&forum_name=apertium-stuff)

> For instance a rule can move around <b pos="1"/> and <b pos="2"/>. If <b
> pos="1"/> is "<sometag>" and <b pos="2"/> is  "</sometag>", the result
> is that </sometag> comes before <sometag>, leading to invalid XML or HTML.
>
> Similar validity errors may be introduced when tags are lost or repeated.
>
> Careful writing of rules may avoid this. In each rule, one can always
> make sure output superblanks in the same order, and as late as possible,
> so that the format is preserved as much as possible.
>
> But not everything can be avoided this way.
>
> Even if superblanks inside a .t1x chunk are correctly handled, .t2x may
> move chunks around (with their superblanks inside, so nothing can be
> done about it) and lead to invalid HTML or XML.
>
> I see no easy way to solve this without a serious redesign of blank
> management (perhaps by keeping a standoff list of blanks outside the
> stream). But I think it's good to be aware of it.
>

Matxin's format (which is already supported by some of the tools)
might be a good starting point for this, but it would be best to use
an XML parser for XML-based formats. You mentioned ITS support as a
wishlist item not too long ago, which would make parsing a
requirement; perhaps it would be best to bundle the two together for a
GSoC project.

-- 
<Sefam> Are any of the mentors around?
<jimregan> yes, they're the ones trolling you

------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to