On Mon, Nov 22, 2004 at 02:32:05PM -0600, Cutts III, James H. wrote:
> I am slowly working my way through the process of getting PDF files to
> be indexed by ht://Dig.  I've found and installed the xpdf 2.01-11 and
> verified that pdftotext works. I've installed and modified the
> doc2html.pl.  I've modified the pdf2html.pl files.  And I've created an
> html file that is points to my PDF files and tweaked my htdig.conf to
> include the external_parsers: command.
> 
> I run htdig -vv -i -c htdig.conf and I get the following errors 
> 
> External parser error: can't parse Content-Type "txt/html"
>  URL:
> http://128.206.75.187/cori_kbase_jhc/pdfs/missouri_hmo/Commuity%20CarePl
> us-Hospitals%20Expansion%202-99pc.pdf
> 
> Once for each pdf file.
> 
> Any suggestions?  The file displays nicely in a web browser.  I suspect
> that it may be the setup of the Apache server and the mime type that
> it's sending.

the default content-type header for html/apache is 
Content-Type: text/html; charset=ISO-8859-1

for pdf you probably want to use
Content-Type: application/pdf

this should happen automatically with mime_module apache module. the
mime.types file by default should contain 
application/pdf                      pdf

some browsers will figure out by the file extension how to
open a pdf file.

"Content-Type: txt/html" header is just wrong, i think.

maybe if you fix the header, it should work better.

--
Milan


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now. 
http://productguide.itmanagersjournal.com/
_______________________________________________
ht://Dig general mailing list: <[EMAIL PROTECTED]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

Reply via email to