Current status:

* With sid's pstotext and ghostscript, I can no longer reproduce the errors
  with the files supplied by Michiel (pstotext_bug.pdf, 19981108_070.pdf,
  19981108_073.pdf, 19981108_091.pdf, 20000329_012.pdf).

* To examine the remaining issues at PostScript level more easily, run
        gs -r72 -dNODISPLAY -dFIXEDMEDIA -dDELAYBIND -dWRITESYSTEMDICT -q \
                 -dNOPAUSE -dSAFER ocr.ps 
  (ocr.ps is part of the pstotext sources).

On Sat, Jul 12, 2008 at 20:44:03 +0200, Laurent Bonnaud wrote:
> a similar problem exists with PDF files from Debian packages, which is a
> problem for dhelp (see bug #475655).  Here are some examples:
> 
> $ zcat /usr/share/doc/ctsim-doc/ctsim.pdf.gz | pstotext > /dev/null
> GPL Ghostscript 8.62: Unrecoverable error, exit code 1

At the point of the error, the currentfont operator has returned a
dictionary which does not have a "FontBBox" key which ocr.ps assumes exists.
To continue execution beyond this point, the data can be faked as follows:

diff -ru pstotext-1.9/ocr.ps pstotext-1.9-patched/ocr.ps
--- pstotext-1.9/ocr.ps 2004-01-09 13:00:30.000000000 +0100
+++ pstotext-1.9-patched/ocr.ps 2008-08-21 15:16:50.000000000 +0200
@@ -366,7 +366,12 @@
   % Print bounding box and character metrics for currentfont
   % Sadly, dvitops produces illegal type 3 fonts with no /.notdef entry. The
   % use of "stopped" deals with that and any other silliness.
-  currentfont /FontBBox get aload pop 4 2 roll
+  currentfont /FontBBox known
+  { currentfont /FontBBox get }
+  { [-100 -250 1000 800] % Fake it
+  }
+  ifelse
+  aload pop 4 2 roll
   //showxy exec
   //showxy exec
   currentfont /FontMatrix get

> $ zcat /usr/share/doc/python-egenix-mxdatetime/mxDateTime.pdf.gz | pstotext > 
> /dev/null

> $ zcat /usr/share/doc/python-egenix-mxtexttools/mxTextTools.pdf.gz | pstotext 
> > /dev/null

> $ zcat /usr/share/doc/python-egenix-mxtools/mxTools.pdf.gz | pstotext > 
> /dev/null

For these three documents, there appears to be one underlying problem which
isn't as easy to fix. Changing ocr.ps's printMetrics to always use the code
path that was written to handle the illegal type 3 fonts produced by dvitops
allows processing to continue beyond the initial problem spot, but
processing still fails further down the road. Further investigation is
needed.

Ray
-- 
I'm really not a very nice person. I can say "I don't care" with a straight
face, and really mean it.
        Linus Torvalds on the linux-kernel list



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

Reply via email to