Hi all,

As part of my work with students in the Google Code-In (notably 
galaxyfeeder) I have found a limitation in the current design of 
Apertium, as regards handling of format tags (encapsulated as 
superblanks) in Apertium.

I would appreciate it very much has time to turn this message into a 
proper bug report, although, as will be seen, rather than a bug, it is a 
design limitation.

Since transfer rules (.t1x, .t2x) have to move superblanks around 
explicitly, it may be the case that valid HTML or XML is rendered 
invalid. For instance, a translated ODT file may not open, or a 
translated XHTML page may not be valid.

For instance a rule can move around <b pos="1"/> and <b pos="2"/>. If <b 
pos="1"/> is "<sometag>" and <b pos="2"/> is  "</sometag>", the result 
is that </sometag> comes before <sometag>, leading to invalid XML or HTML.

Similar validity errors may be introduced when tags are lost or repeated.

Careful writing of rules may avoid this. In each rule, one can always 
make sure output superblanks in the same order, and as late as possible, 
so that the format is preserved as much as possible.

But not everything can be avoided this way.

Even if superblanks inside a .t1x chunk are correctly handled, .t2x may 
move chunks around (with their superblanks inside, so nothing can be 
done about it) and lead to invalid HTML or XML.

I see no easy way to solve this without a serious redesign of blank 
management (perhaps by keeping a standoff list of blanks outside the 
stream). But I think it's good to be aware of it.

For ODT documents, the only solution I see is some kind of clean-up 
repair, of the kind the program tidy does to HTML, done after transfer.

Cheers

Mikel

P.S. Two of my students are also writing reports on two more 
deformatter/reformatter bugs. I will post them when they are done.

-- 
Mikel L. Forcada (http://www.dlsi.ua.es/~mlf/)
Departament de Llenguatges i Sistemes InformĂ tics
Universitat d'Alacant
E-03071 Alacant, Spain
Phone: +34 96 590 9776
Fax: +34 96 590 9326


------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to