Package: docbook2odf
Version: 0.244-1
Followup-For: Bug #436275
Tags: patch
Hello,
I had the same problem, with no-break spaces converted into invalid ODT.
Looking into docbook2odf, it seems that the problem comes from:
$content=~s|([\xC2\x82]+)|'<text:s text:c="'.length($1).'"/>'|eg;
In UTF-8, no-break spaces are C2A0, of which this regex only matches the first
byte. The last byte, A0 is kept, and is invalid in UTF-8.
With this function instead:
$content=~s|(\xC2\xA0)+|'<text:s text:c="'.length($1).'"/>'|eg;
it seems to work fine. So here is a patch, if you may include it.
Regards,
--
Tanguy Ortolo
-- System Information:
Debian Release: 5.0.1
APT prefers stable
APT policy: (990, 'stable'), (500, 'unstable'), (1, 'experimental')
Architecture: amd64 (x86_64)
Kernel: Linux 2.6.26-2-amd64 (SMP w/2 CPU cores)
Locale: LANG=fr_FR.UTF-8, LC_CTYPE=fr_FR.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Versions of packages docbook2odf depends on:
ii libarchive-zip- 1.18-1 Module for manipulation of ZIP arc
ii libxml-sablot-p 1.0-2+b1 encapsulation of the Sablotron XSL
ii perl 5.10.0-19 Larry Wall's Practical Extraction
ii perlmagick 7:6.3.7.9.dfsg2-1~lenny1 Perl interface to the libMagick gr
ii zip 2.32-1 Archiver for .zip files
docbook2odf recommends no packages.
docbook2odf suggests no packages.
-- no debconf information
--- docbook2odf.old 2009-05-19 10:05:28.000000000 +0200
+++ docbook2odf.new 2009-05-19 10:05:50.000000000 +0200
@@ -366,7 +366,7 @@
}
# convert alternative nbsp character to ODF spaces
- $content=~s|([\xC2\x82]+)|'<text:s text:c="'.length($1).'"/>'|eg;
+ $content=~s|(\xC2\xA0)+|'<text:s text:c="'.length($1).'"/>'|eg;
};
print "\n" if $debug;
@@ -533,4 +533,4 @@
}
-1;
\ Pas de fin de ligne à la fin du fichier.
+1;