On 24 December 2013 15:34, Mikel Forcada <m...@dlsi.ua.es> wrote: > Hi all, > > As part of my work with students in the Google Code-In (notably > galaxyfeeder) I have found a limitation in the current design of > Apertium, as regards handling of format tags (encapsulated as > superblanks) in Apertium. > > I would appreciate it very much has time to turn this message into a > proper bug report, although, as will be seen, rather than a bug, it is a > design limitation. > > Since transfer rules (.t1x, .t2x) have to move superblanks around > explicitly, it may be the case that valid HTML or XML is rendered > invalid. For instance, a translated ODT file may not open, or a > translated XHTML page may not be valid. >
This is a known issue (e.g., Jacob mentions it in this thread from 2009: http://sourceforge.net/mailarchive/forum.php?thread_name=20cf28cd0904300204v45f35e51i118f4d146f83748%40mail.gmail.com&forum_name=apertium-stuff) > For instance a rule can move around <b pos="1"/> and <b pos="2"/>. If <b > pos="1"/> is "<sometag>" and <b pos="2"/> is "</sometag>", the result > is that </sometag> comes before <sometag>, leading to invalid XML or HTML. > > Similar validity errors may be introduced when tags are lost or repeated. > > Careful writing of rules may avoid this. In each rule, one can always > make sure output superblanks in the same order, and as late as possible, > so that the format is preserved as much as possible. > > But not everything can be avoided this way. > > Even if superblanks inside a .t1x chunk are correctly handled, .t2x may > move chunks around (with their superblanks inside, so nothing can be > done about it) and lead to invalid HTML or XML. > > I see no easy way to solve this without a serious redesign of blank > management (perhaps by keeping a standoff list of blanks outside the > stream). But I think it's good to be aware of it. > Matxin's format (which is already supported by some of the tools) might be a good starting point for this, but it would be best to use an XML parser for XML-based formats. You mentioned ITS support as a wishlist item not too long ago, which would make parsing a requirement; perhaps it would be best to bundle the two together for a GSoC project. -- <Sefam> Are any of the mentors around? <jimregan> yes, they're the ones trolling you ------------------------------------------------------------------------------ Rapidly troubleshoot problems before they affect your business. Most IT organizations don't have a clear picture of how application performance affects their revenue. With AppDynamics, you get 100% visibility into your Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro! http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk _______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff