According to htdig:
> After I put .pdf into the valid_extensions, I get this under rundig:
>
> Deleted, no excerpt: ID: 138 URL:
> http://www.acnord.dk/kursus/ugekurser.pdf
> Deleted, no excerpt: ID: 146 URL: http://www.acnord.dk/pdf/samlinger.pdf
>
> That is now all the .pdf filer are listed while running rundig, but why
> are they deleted?
First of all, you must be very careful with valid_extensions. When you
use it, you must list ALL the extensions that you want htdig to index,
because anything that's not on the list will be rejected (including .html
if you don't put that in)! The valid_extensions attribute is really for
environments where you need very tight controls on what's indexed, i.e.
where there are too many extensions that you DON'T want to list them all
in bad_extensions.
The reason the PDF files are being deleted by htmerge is, as the message
states, because they have no excerpt. In other words, when htdig tried
to index them, it was unable to find any indexable text in them. There
are two possible reasons for this. One is that the PDFs really don't
contain any text, just images (including possibly images of text). The
other possible reason, and this is the one you need to check out, is that
your external converter for PDF files is misconfigured, so it's not working
correctly.
In your first message in this thread, you wrote:
> in doc2html following change:
> # PDF to HTML conversion script
> # Full pathname of Perl script pdf2html.pl
> my $PDF2HTML = '/usr/local/bin';
but /usr/local/bin is not the full pathname of the Perl script pdf2html.pl,
but rather just the pathname of the directory that contains it. You need
the full pathname of the script file! I.e. you need to set it to
my $PDF2HTML = '/usr/local/bin/pdf2html.pl';
If it still doesn't work after that, try running doc2html.pl manually on
one of your PDFs to see what output and/or errors it produces, e.g.:
$ /usr/local/bin/doc2html.pl /full/path/to/my/file.pdf application/pdf \
http://full/url/to/file.pdf /path/to/htdig.conf
--
Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada)
-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html