Hi all, I’m up and running, well sort of.

 

I did install HTDig, and pointed it towards a folder with just .html files in it to test it…and voila got results.  I actually need this to index pdf files though.

 

I have so far done the following:

 

Added to htdig.conf:
external_parsers: application/pdf->text/html /opt/www/htdig/bin/doc2html/doc2html.pl
 
Installed the xpdf rpm
Installed the doc2html directory and scripts
 
Set the paths for the pdftotext and pdfinfo, as well as setting the path in doc2html.pl for the pdf2text.pl script
 
I checked the largest file size of a pdf and increased the max file size in htdig.conf as well.
 
I run .rundig –v and it indexes one html document that I have at the top level.  All permissions on files are fine I actually set them to 777 to make sure it could get into the folders.  But it doesn’t want to index the pdfs…any ideas…  
 
I don’t receive any error messages either.
 
My file setup is /archives/folder/folder…etc
 
I set htdig start_url at http://192.168.0.25/archives/
 
I’ve tried moving a .pdf to the /archives file, but that doesn’t work either.
 
Thanks!
 
Abbie
 
 

 

 

Reply via email to