Mikel Forcada <m...@dlsi.ua.es> writes: [...]
> Since transfer rules (.t1x, .t2x) have to move superblanks around > explicitly, it may be the case that valid HTML or XML is rendered > invalid. For instance, a translated ODT file may not open, or a > translated XHTML page may not be valid. > > For instance a rule can move around <b pos="1"/> and <b pos="2"/>. If <b > pos="1"/> is "<sometag>" and <b pos="2"/> is "</sometag>", the result > is that </sometag> comes before <sometag>, leading to invalid XML or HTML. > > Similar validity errors may be introduced when tags are lost or repeated. > > Careful writing of rules may avoid this. In each rule, one can always > make sure output superblanks in the same order, and as late as possible, > so that the format is preserved as much as possible. > > But not everything can be avoided this way. > > Even if superblanks inside a .t1x chunk are correctly handled, .t2x may > move chunks around (with their superblanks inside, so nothing can be > done about it) and lead to invalid HTML or XML. > > I see no easy way to solve this without a serious redesign of blank > management (perhaps by keeping a standoff list of blanks outside the > stream). But I think it's good to be aware of it. There are a lot of related issues with respect to blank handling: * We can't reorder word-related blanks when we reorder words, because it would invalidate HTML ** Meaning the wrong words get bolded, and we can never id words to figure out which input words turned into which output words * In fact, we can't output any superblanks inside chunks at all, because t2x might reorder the chunks and invalidate HTML * Transfer rule writers have to be very careful with both outputting all <b pos="N"/> _and_ outputting them in the right order _and_ not putting any of them inside chunks (I think most of us weren't even aware of that last issue before) I've taken some time to write up both the problems, and a sketch of a solution based on Tino's suggestions, with how it could look in Apertium, here: http://wiki.apertium.org/wiki/Reordering_superblanks The sketched solution seems to me like it should deal with all of the above issues. Comments please :-) -- Kevin Brubeck Unhammer GPG: 0x766AC60C
pgpkKJiGCDTWc.pgp
Description: PGP signature
------------------------------------------------------------------------------ The best possible search technologies are now affordable for all companies. Download your FREE open source Enterprise Search Engine today! Our experts will assist you in its installation for $59/mo, no commitment. Test it for FREE on our Cloud platform anytime! http://pubads.g.doubleclick.net/gampad/clk?id=145328191&iu=/4140/ostg.clktrk
_______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff