I have htdig 3.1.5 happily installed on my suse6.4 box but it doesn't seem to be
indexing PDF files.
Heres some details:
HTDIG.CONF
pdf_parser /usr/local/bin/acroread -toPostScript -pairs
max_doc_size 300000
(acroread is version 3.1 and will happily convert a sample PDF to PS; all PDFs are
well under the max_doc_size)
HTDIG -v
lists the PDF files & their size OK (ie looks as though indexing)
however I don't see the '+--+--**' that you get for HTML files - is this a problem?
HTMERGE -v
"Deleted, no excerpt: x/http;//.......PDF"
I get this message once for each of my PDFs
I read in an earlier post that "Deleted, no excerpt" can be due to:
> > - disallowed in robots.txt
> > - indexing turned off by meta robots or noindex tags
> > - no indexable text in documents
> > - server_max_docs exceeded
> Also when merging:
> - duplicates between the two databases (oldest is removed)
These files aren't dissallowed / turned off.
server_max_docs isn't set in my httpd.conf - I don't think that this will be a problem
as its a small site (around 100 pages)
So I assume that theres no indexable text as the PDF parsing failed (even though there
were no error messages).
Any hints anyone?
Or should I just install xpdf and try that?
thanks in advance,
mike
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.