Date: Mon, 22 Jul 2002 20:31:04 +0200
To: [EMAIL PROTECTED] 
From: "htdig" <[EMAIL PROTECTED]>
Subject: [htdig] pdf-files?

>I have htdig running mostly as wanted.
>It does not seem to index .pdf files, and I'm sure it has to do with
my
>lack of understanding.

>The only thing I found in the FAQ was 'a too narrow max_size, whick is
set
>to 2000000, and my largest .pdf file is about 900000.

Hi!

I'll take a stab at this...

If you are using the apache indexing, you need to make sure your
largest document size exceeds the size of your largest dir (do a ls -l
in dir above where your docs reside and add some for growth), not just
your largest document.  The reason is dig reads in the index but if it
exceeds the largest size, it is trucated and only gets the docs up to
that point.  I had this problem before and that fixed it.


>However I think it has something to do with a lack of any "PDF2TEXT"
>conversionmodule I have to install ???

>Would anyone pse enlighten me what I have to do.
> The environment is RH7.1, Apache and ht://Dig 3.2.0b4

Get the scripts  listed below from http://htdig.org/contrib/ and put
the following in your .conf file:
external_parsers: application/rtf->text/html /usr/local/bin/doc2html.pl
\
                  text/rtf->text/html /usr/local/bin/doc2html.pl \
                  application/pdf->text/html /usr/local/bin/doc2html.pl
\
                  application/postscript->text/html
/usr/local/bin/doc2html.pl \

You will need perl
(http://www.activestate.com/Products/Download/Download.plex?id=ActivePerl)
installed to use the files as well as xpdf
(http://www.foolabs.com/xpdf/)


>finn

HTH!




Bill Akins, CNE
Sr. OSA
Emory Healthcare
(404) 712-2879 - Office
12674 - PIC
[EMAIL PROTECTED]


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
CONFIDENTIALITY NOTICE:

This message may contain legally confidential and privileged information
and is intended only for the named recipient(s).  No one else is 
authorized to read, disseminate, distribute, copy, or otherwise disclose
the contents of this message.  If you have received this message in 
error, please notify the sender immediately by e-mail or telephone and 
delete the message in its entirety. Thank you.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
<<<<GWIASIG 0.06c>>>>


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to