With more information we can make better informed guesses.
Do you ONLY get "no excerpt" with PDF files?
Are you using doc2html or conv_doc to index other types of document, and are
they OK?
Are you doing a simple run of htdig followed by htmerge, or something more
complicated, such as merging two or more runs of htdig?
Have you tried producing a log from doc2html? - This will report on how many
bytes of text it has extracted from each file.
Do the PDF files contain words which are not in your bad words list?
--
David Adams
Computing Services
Southampton University
----- Original Message -----
From: "Per-Henrik Persson" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Tuesday, July 24, 2001 8:13 AM
Subject: Re: [htdig] DELETED, no excerpt on PDF's
> * David Adams <[EMAIL PROTECTED]> [010717 18:10]:
> > You havn't mentioned any warning messages from doc2html, so it must be
doing
> > something?
> >
> > Have you tried doc2html from the command line? The format is:
> >
> > doc2html.pl filename application/pdf
> >
> > Check that the output does contain text extracted from the file.
> >
> > If that is OK, then the problem may be in your configuration file, check
> > that the external_parsers
> > attribute is used correctly.
> >
> > > Hi!
> > >
> > > I've just installed and configured ht://dig on a server at work.
> > > Everything works GREAT, except for the indexing of pdf-files :(
> > >
> > > I've read all the previous posts to the mailinglist getting no
solution
> > > for my problem. I have tried both doc2html and conv_doc as external
> > > converters to parse pdf-files.
> > >
> > > All the files get indexed when I run htdig but when I run htmerge I
get:
> > > Deleted, no excerpt: 209/http://www.foo.bar/xxx.pdf
> > > on all the pdf-files.
> > >
> > > I have set the max doc size to well above the largest pdf-file. The
> > > pdf-files do have text-content -- I can event run conv_doc with the
> > > necessary options and redirect the output to a .html-file. Then
htmerge
> > > gladly accepts it!
> > >
> > > Please help me...
> > >
> > > Thanks in advance,
> > >
> > > P-H
> > >
>
> Still nobody has a solution?
>
> /P-H
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html