Hi all, As part of my work with students in the Google Code-In (notably galaxyfeeder) I have found a limitation in the current design of Apertium, as regards handling of format tags (encapsulated as superblanks) in Apertium.
I would appreciate it very much has time to turn this message into a proper bug report, although, as will be seen, rather than a bug, it is a design limitation. Since transfer rules (.t1x, .t2x) have to move superblanks around explicitly, it may be the case that valid HTML or XML is rendered invalid. For instance, a translated ODT file may not open, or a translated XHTML page may not be valid. For instance a rule can move around <b pos="1"/> and <b pos="2"/>. If <b pos="1"/> is "<sometag>" and <b pos="2"/> is "</sometag>", the result is that </sometag> comes before <sometag>, leading to invalid XML or HTML. Similar validity errors may be introduced when tags are lost or repeated. Careful writing of rules may avoid this. In each rule, one can always make sure output superblanks in the same order, and as late as possible, so that the format is preserved as much as possible. But not everything can be avoided this way. Even if superblanks inside a .t1x chunk are correctly handled, .t2x may move chunks around (with their superblanks inside, so nothing can be done about it) and lead to invalid HTML or XML. I see no easy way to solve this without a serious redesign of blank management (perhaps by keeping a standoff list of blanks outside the stream). But I think it's good to be aware of it. For ODT documents, the only solution I see is some kind of clean-up repair, of the kind the program tidy does to HTML, done after transfer. Cheers Mikel P.S. Two of my students are also writing reports on two more deformatter/reformatter bugs. I will post them when they are done. -- Mikel L. Forcada (http://www.dlsi.ua.es/~mlf/) Departament de Llenguatges i Sistemes InformĂ tics Universitat d'Alacant E-03071 Alacant, Spain Phone: +34 96 590 9776 Fax: +34 96 590 9326 ------------------------------------------------------------------------------ Rapidly troubleshoot problems before they affect your business. Most IT organizations don't have a clear picture of how application performance affects their revenue. With AppDynamics, you get 100% visibility into your Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro! http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk _______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff