Package: docbook2odf
Version: 0.244-1
Severity: normal
When a Docbook XML file contains unbreakable spaces ( ), docbook2odf
generates files that Openoffice can't read. Here's a diff:
[EMAIL PROTECTED]:/tmp$ diff -u article2.xml article3.xml
--- article2.xml 2007-08-06 20:09:24.000000000 +0200
+++ article3.xml 2007-08-06 20:08:59.000000000 +0200
@@ -106,7 +106,7 @@
<tip>
<!-- <title>Astuce</title> -->
<para>Pour taper des équations, on peut utiliser la syntaxe
- LaTeX : <inlineequation>
+ LaTeX : <inlineequation>
<alt>$ \phi = \frac{\sqrt{5}-1}{2} $</alt>
<graphic/>
</inlineequation> sum dolor sit amet, consectetuer adipiscing elit,
sed
[EMAIL PROTECTED]:/tmp$
When I try to load article2.odt, I get an error message that says:
,----
| Read-Error.
| Format error discovered in the file in sub-document content.xml at
| 623,26(row,col).
`----
article3.odt works fine. If I unzip article2.odt, here's what
content.xml contains around line 623:
[EMAIL PROTECTED]:/tmp$ nl -ba content.xml | grep -C5 '^ *623'
618 <text:p text:style-name="para-padding">important</text:p>
619
620
621
622 <text:p text:style-name="para-padding">Pour taper des
équations, on peut utiliser la syntaxe
623 LaTeX<text:s text:c="1"/>�:
624 $ \phi = \frac{\sqrt{5}-1}{2} $
625
626 sum dolor sit amet, consectetuer adipiscing elit, sed
627 diam nonummy nibh euismod tincidunt ut lsum dolor sit amet,
consectetuer adipiscing elit, sed
628 diam nonummy nibh euismod tincidunt ut lsum dolor sit amet,
consectetuer adipiscing elit, sed
The funky character before the colon at line 623 seems to be my
unbreakable space, only not properly encoded:
[EMAIL PROTECTED]:/tmp$ nl -ba content.xml | grep -C5 '^ *623' | recode l1..u8
618 <text:p text:style-name="para-padding">important</text:p>
619
620
621
622 <text:p text:style-name="para-padding">Pour taper des
équations, on peut utiliser la syntaxe
623 LaTeX<text:s text:c="1"/> :
624 $ \phi = \frac{\sqrt{5}-1}{2} $
625
626 sum dolor sit amet, consectetuer adipiscing elit, sed
627 diam nonummy nibh euismod tincidunt ut lsum dolor sit amet,
consectetuer adipiscing elit, sed
628 diam nonummy nibh euismod tincidunt ut lsum dolor sit amet,
consectetuer adipiscing elit, sed
Note the é sequence, which is what "é" looks like when encoded
from Latin1 to UTF-8 once too many. The NBSP underneath does come out
properly this time.
Roland.
-- System Information:
Debian Release: lenny/sid
APT prefers unstable
APT policy: (500, 'unstable')
Architecture: i386 (i686)
Kernel: Linux 2.6.22-1-k7 (SMP w/1 CPU core)
Locale: LANG=fr_FR.UTF-8, LC_CTYPE=fr_FR.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash
Versions of packages docbook2odf depends on:
ii libarchive-zip-perl 1.18-1 Module for manipulation of ZIP arc
ii libxml-sablot-perl 1.0-2 encapsulation of the Sablotron XSL
ii perl 5.8.8-7 Larry Wall's Practical Extraction
ii perlmagick 7:6.2.4.5.dfsg1-1 A perl interface to the libMagick
ii zip 2.32-1 Archiver for .zip files
docbook2odf recommends no packages.
-- no debconf information