Bug#646246: gscan2pdf ocropus and html-entities

noreply Sat, 22 Oct 2011 15:09:23 -0700

Looks like this is a utf-8 problem.

The following patch seems to fix editing problem
and conversion problems on pdf-export.


tesseract texts fail on editing and changing the page.
same as before... problem still not found.
however... next bug report will bring a real improvement.


--- /usr/share/perl5/Gscan2pdf/Page.pm  2011-08-27 07:00:41.000000000 +0200
+++ /usr/share/perl5/Gscan2pdf/Page.pm  2011-10-22 23:57:19.492261844 +0200
@@ -11,6 +11,7 @@
 use HTML::TokeParser;
 use HTML::Entities;
 use Image::Magick;
+use Encode;
 use utf8;
 
 BEGIN {
@@ -135,7 +136,7 @@
     }
    }
    if ( $token->[0] eq 'T' and $token->[1] !~ /^\s*$/ ) {
-    $text = HTML::Entities::decode_entities( $token->[1] );
+    $text = HTML::Entities::decode_entities(decode_utf8( $token->[1] ));
     chomp($text);
    }
    if ( $token->[0] eq 'E' ) {




-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Bug#646246: gscan2pdf ocropus and html-entities

Reply via email to