I asked earlier if htdig follows links in PDF files. I see now that the pdf
parser used (pdftotext, from the xpdf package) does not provide links.

However, there is a related tool, pdftohtml
(http://www.ra.informatik.uni-stuttgart.de/~gosho/pdftohtml/index.html),
that looks like it can provide a clean html page for htdig to munch on. So
it looks like a simple change to have the links followed.

But...

htdig uses parse_doc.pl to call pdftotext, and this guy does a lot to the
output of the pdf parser. I don't see why these modifications are needed.

My question:

- If pdftohtml produces a single clean html file from PDF input, can I make
this work by just identifying it as the external_handler for pdf (with
appropriate arguments)?

- Does something else have to be done to tell htdig to process the file as
html, with links?

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to