I have following in htdig.conf:
external_parsers: application/rtf->text/html /usr/local/bin/doc2html.pl \
text/rtf->text/html /usr/local/bin/doc2html.pl \
application/pdf->text/html /usr/local/bin/doc2html.pl \
application/postscript->text/html
/usr/local/bin/doc2html.pl
in /usr/local/bin following files:
-rw-r--r-- 1 root root 2207 aug 30 00:46 acroconv.pl
-rw-r--r-- 1 root root 17000 aug 29 11:55 doc2html.pl
-rw-r--r-- 1 root root 2368 aug 30 00:48 parsepdf.pl
-rw-r--r-- 1 root root 4083 aug 29 11:44 pdf2html.pl
-rw-r--r-- 1 root root 1324 aug 29 11:45 swf2html.pl
in doc2html following change:
# PDF to HTML conversion script
# Full pathname of Perl script pdf2html.pl
my $PDF2HTML = '/usr/local/bin';
and following section ( of which I don't understand much):
# Adobe PDF file using Perl script
if ($PDF2HTML) {
$mime_type = "application/pdf";
$cmd = $PDF2HTML;
# Replace default title (if used) with filename:
$cmdl = "$cmd $Input $mime_type $name";
$magic = '%PDF-|\0PDF CARO\001\000\377';
&store_html_method('PDF (pdf2html)',$cmd,$cmdl,$mime_type,$magic);
}
in pdf2html.pl:
#### YOU MUST SET THESE ####
my $PDFTOTEXT = "/usr//bin/pdftotext";
my $PDFINFO = "/usr/bin/pdfinfo";
#
and in /usr/bin following files:
[root@WebSrv bin]# ls /usr/bin/pd*
/usr/bin/pdf2dsc /usr/bin/pdfimages /usr/bin/pdftopbm /usr/bin/pdftotext
/usr/bin/pdf2ps /usr/bin/pdfinfo /usr/bin/pdftops /usr/bin/pdiff
when I run rundig some of outputlines shows:
28:138:1:http://www.acnord.dk/pdf/?N=D: *****-------- size = 1486
30:139:1:http://www.acnord.dk/pdf/?M=A: *+***-------- size = 1486
31:140:1:http://www.acnord.dk/pdf/?S=A: **+**-------- size = 1486
- that is the directory /pdf/ containes some of the pdf-files, but their
names don't show up.
when I run htdig -vv some lines shovs:
344:417:1:http://www.acnord.dk/pdf/?M=A: (changed)
title: Index of /pdf
*****
url rejected: (level 1)http://www.acnord.dk/pdf/ugekurser.pdf
url rejected: (level 1)http://www.acnord.dk/pdf/ugekurser0203.pdf
url rejected: (level 1)http://www.acnord.dk/pdf/vovkatalog.pdf
url rejected: (level 1)http://www.acnord.dk/pdf/op10-lo.mp3
url rejected: (level 1)http://www.acnord.dk/pdf/op10.mp3
url rejected: (level 1)http://www.acnord.dk/pdf/SFO-IT.pdf
url rejected: (level 1)http://www.acnord.dk/pdf/AVG.pdf
url rejected: (level 1)http://www.acnord.dk/pdf/samlinger.pdf
size = 1486
I don't fig. out why they are rejected (not in badext-list)
--
one thing concerns me is that my server RH7 runs in textmode only.
Do i have to startx in order to have xpdf work?
yours
finn
-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html