I'm using htdig with pdftotext and parse_doc.pl to index all of my pdf files. However, with one paticular pdf file, i get the following error:
[root pdf]# pdftotext ts.pdf Error: Unknown Type 0 character set: Adobe-Identity Error: Unknown Type 0 character set: Adobe-Identity Error: Unknown Type 0 character set: Adobe-Identity And the generated ts.txt looks as follows: -rw-rw-r-- 1 root httpd 3426 Jan 15 16:27 ts.txt But, if I 'cat ts.txt' it shows no data. Furthermore, pdftohtml does extract the images, but not the text and I get the same error. pdftops produces the same error 6 times. I can copy and paste the text if i open it in Acroread for windows. So....any ideas? I can't find information on this anywhere. Thanks _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

