I have following in htdig.conf:
external_parsers: application/rtf->text/html /usr/local/bin/doc2html.pl \
                  text/rtf->text/html /usr/local/bin/doc2html.pl \
                  application/pdf->text/html /usr/local/bin/doc2html.pl \
                  application/postscript->text/html
/usr/local/bin/doc2html.pl

in  /usr/local/bin following files:
-rw-r--r--    1 root     root         2207 aug 30 00:46 acroconv.pl
-rw-r--r--    1 root     root        17000 aug 29 11:55 doc2html.pl
-rw-r--r--    1 root     root         2368 aug 30 00:48 parsepdf.pl
-rw-r--r--    1 root     root         4083 aug 29 11:44 pdf2html.pl
-rw-r--r--    1 root     root         1324 aug 29 11:45 swf2html.pl

in doc2html following change:
# PDF to HTML conversion script
# Full pathname of Perl script pdf2html.pl
my $PDF2HTML = '/usr/local/bin';

and following section ( of which I don't understand much):
  # Adobe PDF file using Perl script
  if ($PDF2HTML) {
    $mime_type = "application/pdf";
    $cmd = $PDF2HTML;
    # Replace default title (if used) with filename:
    $cmdl = "$cmd $Input $mime_type $name";
    $magic = '%PDF-|\0PDF CARO\001\000\377';
    &store_html_method('PDF (pdf2html)',$cmd,$cmdl,$mime_type,$magic);
  }

in pdf2html.pl:
#### YOU MUST SET THESE  ####
my $PDFTOTEXT = "/usr//bin/pdftotext";
my $PDFINFO = "/usr/bin/pdfinfo";
#
and in /usr/bin following files:
[root@WebSrv bin]# ls /usr/bin/pd*
/usr/bin/pdf2dsc  /usr/bin/pdfimages  /usr/bin/pdftopbm  /usr/bin/pdftotext
/usr/bin/pdf2ps   /usr/bin/pdfinfo    /usr/bin/pdftops   /usr/bin/pdiff

when I run rundig some of outputlines shows:
28:138:1:http://www.acnord.dk/pdf/?N=D: *****-------- size = 1486
30:139:1:http://www.acnord.dk/pdf/?M=A: *+***-------- size = 1486
31:140:1:http://www.acnord.dk/pdf/?S=A: **+**-------- size = 1486
- that is the directory /pdf/ containes some of the pdf-files, but their
names don't show up.

when I run htdig -vv  some lines shovs:
344:417:1:http://www.acnord.dk/pdf/?M=A:  (changed) 
title: Index of /pdf
*****
url rejected: (level 1)http://www.acnord.dk/pdf/ugekurser.pdf
url rejected: (level 1)http://www.acnord.dk/pdf/ugekurser0203.pdf
url rejected: (level 1)http://www.acnord.dk/pdf/vovkatalog.pdf
url rejected: (level 1)http://www.acnord.dk/pdf/op10-lo.mp3
url rejected: (level 1)http://www.acnord.dk/pdf/op10.mp3
url rejected: (level 1)http://www.acnord.dk/pdf/SFO-IT.pdf
url rejected: (level 1)http://www.acnord.dk/pdf/AVG.pdf
url rejected: (level 1)http://www.acnord.dk/pdf/samlinger.pdf
 size = 1486

I don't fig. out why they are rejected (not in badext-list)

--
one thing concerns me is that my server RH7 runs in textmode only. 
Do i have to startx in order to have xpdf work?


yours
finn



-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone?  Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to