I'm using htdig with pdftotext and parse_doc.pl to index all of my pdf
files. However, with one paticular pdf file, i get the following error:

[root pdf]# pdftotext ts.pdf
Error: Unknown Type 0 character set: Adobe-Identity
Error: Unknown Type 0 character set: Adobe-Identity
Error: Unknown Type 0 character set: Adobe-Identity

And the generated ts.txt looks as follows:

-rw-rw-r--   1 root     httpd        3426 Jan 15 16:27 ts.txt

But, if I 'cat ts.txt' it shows no data. Furthermore, pdftohtml does extract
the images, but not the text and I get the same error. pdftops produces the
same error 6 times. I can copy and paste the text if i open it in Acroread
for windows. So....any ideas? I can't find information on this anywhere.

Thanks

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to