* David Adams <[EMAIL PROTECTED]> [010717 18:10]:
> You havn't mentioned any warning messages from doc2html, so it must be doing
> something?
>
> Have you tried doc2html from the command line? The format is:
>
> doc2html.pl filename application/pdf
>
> Check that the output does contain text extracted from the file.
>
The output is perfectly valid HTML, with a lot of extracted text from
the pdf. This part seems to work...
> If that is OK, then the problem may be in your configuration file, check
> that the external_parsers
> attribute is used correctly.
>
As i said, htdig really runs doc2html.pl, but then htmerge deletes the
files with the message Deleted, no excerpt: 209/http:/...
The external_parsers part in htdig.conf looks like following
--snip
external_parsers: application/pdf->text/html /opt/www/htdig/bin/doc2html.pl
--snip
Still doesn't work... As I said before, the same behavior occured when I
tried using conv_doc.pl
/P-H
> > Hi!
> >
> > I've just installed and configured ht://dig on a server at work.
> > Everything works GREAT, except for the indexing of pdf-files :(
> >
> > I've read all the previous posts to the mailinglist getting no solution
> > for my problem. I have tried both doc2html and conv_doc as external
> > converters to parse pdf-files.
> >
> > All the files get indexed when I run htdig but when I run htmerge I get:
> > Deleted, no excerpt: 209/http://www.foo.bar/xxx.pdf
> > on all the pdf-files.
> >
> > I have set the max doc size to well above the largest pdf-file. The
> > pdf-files do have text-content -- I can event run conv_doc with the
> > necessary options and redirect the output to a .html-file. Then htmerge
> > gladly accepts it!
> >
> > Please help me...
> >
> > Thanks in advance,
> >
> > P-H
> >
> >
> ****************************************************************************
> ***
> > Per-Henrik Persson 0703-68 53 86
> > [EMAIL PROTECTED] http://www.whatever.nu
> >
> > "Just because something doesn't work, it doesn't mean it can't be used..."
> >
> ****************************************************************************
> ***
> >
> > _______________________________________________
> > htdig-general mailing list <[EMAIL PROTECTED]>
> > To unsubscribe, send a message to
> <[EMAIL PROTECTED]> with a subject of unsubscribe
> > FAQ: http://htdig.sourceforge.net/FAQ.html
> >
>
*******************************************************************************
Per-Henrik Persson 0703-68 53 86
[EMAIL PROTECTED] http://www.whatever.nu
"Just because something doesn't work, it doesn't mean it can't be used..."
*******************************************************************************
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html