Package: docbook2odf
Version: 0.244-1
Followup-For: Bug #436275
Tags: patch

Hello,

I had the same problem, with no-break spaces converted into invalid ODT.
Looking into docbook2odf, it seems that the problem comes from:
    $content=~s|([\xC2\x82]+)|'<text:s text:c="'.length($1).'"/>'|eg;
In UTF-8, no-break spaces are C2A0, of which this regex only matches the first
byte. The last byte, A0 is kept, and is invalid in UTF-8.

With this function instead:
    $content=~s|(\xC2\xA0)+|'<text:s text:c="'.length($1).'"/>'|eg;
it seems to work fine. So here is a patch, if you may include it.

Regards,

-- 
Tanguy Ortolo

-- System Information:
Debian Release: 5.0.1
  APT prefers stable
  APT policy: (990, 'stable'), (500, 'unstable'), (1, 'experimental')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.26-2-amd64 (SMP w/2 CPU cores)
Locale: LANG=fr_FR.UTF-8, LC_CTYPE=fr_FR.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages docbook2odf depends on:
ii  libarchive-zip- 1.18-1                   Module for manipulation of ZIP arc
ii  libxml-sablot-p 1.0-2+b1                 encapsulation of the Sablotron XSL
ii  perl            5.10.0-19                Larry Wall's Practical Extraction 
ii  perlmagick      7:6.3.7.9.dfsg2-1~lenny1 Perl interface to the libMagick gr
ii  zip             2.32-1                   Archiver for .zip files

docbook2odf recommends no packages.

docbook2odf suggests no packages.

-- no debconf information
--- docbook2odf.old     2009-05-19 10:05:28.000000000 +0200
+++ docbook2odf.new     2009-05-19 10:05:50.000000000 +0200
@@ -366,7 +366,7 @@
        }
 
        # convert alternative nbsp character to ODF spaces
-       $content=~s|([\xC2\x82]+)|'<text:s text:c="'.length($1).'"/>'|eg;
+       $content=~s|(\xC2\xA0)+|'<text:s text:c="'.length($1).'"/>'|eg;
 };
 print "\n" if $debug;
 
@@ -533,4 +533,4 @@
 }
 
 
-1;
\ Pas de fin de ligne à la fin du fichier.
+1;

Reply via email to