[htdig] PDF indexing problem: Deleted, no excerpt

Mike Gardner Wed, 09 Aug 2000 05:47:22 -0700

I have htdig 3.1.5 happily installed on my suse6.4 box but it doesn't seem to be 
indexing PDF files.
Heres some details:

HTDIG.CONF
pdf_parser /usr/local/bin/acroread -toPostScript -pairs
max_doc_size 300000

(acroread is version 3.1 and will happily convert a sample PDF to PS; all PDFs are 
well under the max_doc_size)

HTDIG -v
lists the PDF files & their size OK (ie looks as though indexing)
however I don't see the '+--+--**' that you get for HTML files - is this a problem?

HTMERGE -v
"Deleted, no excerpt: x/http;//.......PDF"
I get this message once for each of my PDFs

I read in an earlier post that "Deleted, no excerpt" can be due to:
> > - disallowed in robots.txt 
> > - indexing turned off by meta robots or noindex tags 
> > - no indexable text in documents 
> > - server_max_docs exceeded 
> Also when merging: 
> - duplicates between the two databases (oldest is removed) 

These files aren't dissallowed / turned off.
server_max_docs isn't set in my httpd.conf - I don't think that this will be a problem 
as its a small site (around 100 pages)
So I assume that theres no indexable text as the PDF parsing failed (even though there 
were no error messages).

Any hints anyone?
Or should I just install xpdf and try that?

thanks in advance,
mike




------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.

[htdig] PDF indexing problem: Deleted, no excerpt

Reply via email to