On Mon, Nov 22, 2004 at 02:32:05PM -0600, Cutts III, James H. wrote: > I am slowly working my way through the process of getting PDF files to > be indexed by ht://Dig. I've found and installed the xpdf 2.01-11 and > verified that pdftotext works. I've installed and modified the > doc2html.pl. I've modified the pdf2html.pl files. And I've created an > html file that is points to my PDF files and tweaked my htdig.conf to > include the external_parsers: command. > > I run htdig -vv -i -c htdig.conf and I get the following errors > > External parser error: can't parse Content-Type "txt/html" > URL: > http://128.206.75.187/cori_kbase_jhc/pdfs/missouri_hmo/Commuity%20CarePl > us-Hospitals%20Expansion%202-99pc.pdf > > Once for each pdf file. > > Any suggestions? The file displays nicely in a web browser. I suspect > that it may be the setup of the Apache server and the mime type that > it's sending.
the default content-type header for html/apache is Content-Type: text/html; charset=ISO-8859-1 for pdf you probably want to use Content-Type: application/pdf this should happen automatically with mime_module apache module. the mime.types file by default should contain application/pdf pdf some browsers will figure out by the file extension how to open a pdf file. "Content-Type: txt/html" header is just wrong, i think. maybe if you fix the header, it should work better. -- Milan ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://productguide.itmanagersjournal.com/ _______________________________________________ ht://Dig general mailing list: <[EMAIL PROTECTED]> ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-general