Hello,
I�m trying to establish a document server with htdig under SuSE-8.2. In
this context I also tried to build an index of pdf-files, created with
LyX, with the htdig external_parsers method.
This works for all my pdf-files, except the ones from LyX-sources.
After a long time of debugging in the following files:
genhtdig.pl -> htdig -> doc2html.pl -> pdf2html.pl -> pdftotext
In the end I found the following in the MAN-page of pdftotext:
BUGS
Some PDF files contain fonts whose encodings have been
mangled beyond recognition. There is no way (short of
OCR) to extract text from these files.
Question: is that really the point, why pdftotext fails in processing
lyx-pdf files?
And if so, is there another way, in indexing lyx-pdf files?
Thank you
bernhard
--
http://home.t-online.de/home/mb.schiekel/
GPG-Key available: GnuPG-1.2.2