Looks like this is a utf-8 problem.
The following patch seems to fix editing problem
and conversion problems on pdf-export.
tesseract texts fail on editing and changing the page.
same as before... problem still not found.
however... next bug report will bring a real improvement.
--- /usr/share/perl5/Gscan2pdf/Page.pm 2011-08-27 07:00:41.000000000 +0200
+++ /usr/share/perl5/Gscan2pdf/Page.pm 2011-10-22 23:57:19.492261844 +0200
@@ -11,6 +11,7 @@
use HTML::TokeParser;
use HTML::Entities;
use Image::Magick;
+use Encode;
use utf8;
BEGIN {
@@ -135,7 +136,7 @@
}
}
if ( $token->[0] eq 'T' and $token->[1] !~ /^\s*$/ ) {
- $text = HTML::Entities::decode_entities( $token->[1] );
+ $text = HTML::Entities::decode_entities(decode_utf8( $token->[1] ));
chomp($text);
}
if ( $token->[0] eq 'E' ) {
--
To UNSUBSCRIBE, email to [email protected]
with a subject of "unsubscribe". Trouble? Contact [email protected]