[Libreoffice-bugs] [Bug 76021] FORMATTING: Libre Office Writer: save As HTML results in interlaced and tags

~~bugzilla-daemon Sat, 15 Mar 2014 03:27:07 -0700~~

https://bugs.freedesktop.org/show_bug.cgi?id=76021
--- Comment #11 from Patrick Goetz <[email protected]> --- > If you want a valid XML document export it as XHTML, which is actually using > XML as a base. The problem with this is that the xhtml I get when I use "Export to xhtml" is, in my opinion, quite bizarre (however, similar to what you get with "Publish to the Web" using Google Docs). Using the attached .docx file as a starting point, this is what I get when I export to xhtml (snippet of file): ComplainantÂ shall mean (a)Â theÂ anyÂ person or persons from whom the Intake Officer receives information concerning an OffenseÂ and who, upon consent of that person(s), is designated a Complainant by the Intake OfficerÂ or (b) any Injured Person designated by the Bishop Diocesan who in the Bishop Diocesanâ€™s discretion, should be afforded the status of a Complainant, provided, however, that any Injured Person so designated may decline such designation. (Ignoring that vim on the Windows XP machine I'm using is not reading the UTF-8 characters correctly), notice that common tags such as and are being inserted as classes using the tag. In this case, .T1 maps to single CSS attribute: .T1 { font-weight:bold; } In a longer version of the same document (i.e. including more text from the same original document) you get more complex classes: .T1 { font-size:10pt; font-weight:bold; } .T13 { font-style:italic; } .T14 { font-style:italic; } .T15 { font-style:italic; } .T16 { font-style:italic; text-decoration:underline; } .T17 { font-style:italic; text-decoration:underline; } .T18 { font-style:italic; } .T19 { font-style:italic; font-weight:bold; } .T20 { font-style:italic; font-weight:bold; } .T21 { font-style:italic; font-weight:bold; } .T22 { font-style:italic; font-weight:bold; } .T26 { padding:0in; border-style:none; } .T27 { text-decoration:underline; } .T28 { text-decoration:underline; padding:0in; border-style:none; } .T29 { font-style:italic; text-decoration:underline; } This is both unreadable and hard to parse. Moreover, if I take exactly the same document and add some text, then all these classes change! Also note the strange duplication of classes that do exactly the same thing (.T13,.T14,.T15,.T18) In my application, what I need to do is extract the text, preserving simple formatting such as , , , and (deprecated) <strike> in order to paste this content into another xml document. This is do-able using the exported xhtml, but extremely onerous; since, for example, it will require at least 2 passes through a parser: first to add the simple xhtml tags I want (, ) that weren't included in the first place, then another pass to strip out all the remaining classes and other xhmtl coding that I don't want. I can't fathom why KISS isn't being applied here: use basic xhtml tags whenever possible in order to keep the output readable and sane. I've written a fair amount of XML parsing code myself, so do know something about it. I can't help but think this is an example of incredibly lazy programming (unless I'm missing something). -- You are receiving this mail because: You are the assignee for the bug.
_______________________________________________ Libreoffice-bugs mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs

Previous message

View by thread

View by date

Next message

[Libreoffice-bugs] [Bug 76021] FORMATTING: Libre Office Wr... bugzilla-daemon

[Libreoffice-bugs] [Bug 76021] FORMATTING: Libre Office Wr... bugzilla-daemon

[Libreoffice-bugs] [Bug 76021] FORMATTING: Libre Office Wr... bugzilla-daemon

[Libreoffice-bugs] [Bug 76021] FORMATTING: Libre Office Wr... bugzilla-daemon

[Libreoffice-bugs] [Bug 76021] FORMATTING: Libre Office Wr... bugzilla-daemon

[Libreoffice-bugs] [Bug 76021] FORMATTING: Libre Office Wr... bugzilla-daemon

[Libreoffice-bugs] [Bug 76021] FORMATTING: Libre Office Wr... bugzilla-daemon

[Libreoffice-bugs] [Bug 76021] FORMATTING: Libre Office Wr... bugzilla-daemon

[Libreoffice-bugs] [Bug 76021] FORMATTING: Libre Office Wr... bugzilla-daemon

[Libreoffice-bugs] [Bug 76021] FORMATTING: Libre Office Wr... bugzilla-daemon

[Libreoffice-bugs] [Bug 76021] FORMATTING: Libre Office Wr... bugzilla-daemon

[Libreoffice-bugs] [Bug 76021] FORMATTING: Libre Office Wr... bugzilla-daemon

[Libreoffice-bugs] [Bug 76021] FORMATTING: Libre Office Wr... bugzilla-daemon

[Libreoffice-bugs] [Bug 76021] FORMATTING: Libre Office Wr... bugzilla-daemon

[Libreoffice-bugs] [Bug 76021] FORMATTING: Libre Office Wr... bugzilla-daemon

Reply via email to