* David Adams <[EMAIL PROTECTED]> [010724 15:54]:
> With more information we can make better informed guesses.
>
> Do you ONLY get "no excerpt" with PDF files?
> Are you using doc2html or conv_doc to index other types of document, and are
> they OK?
I only use doc2html to index pdf-files, don't have any other files that
I'm interested in indexing.
> Are you doing a simple run of htdig followed by htmerge, or something more
> complicated, such as merging two or more runs of htdig?
First I run a simple "htdig -v -a -i"... Then I get the usual output
while indexing. For one pdf-file that is:
207:209:7:http://www.citu.lu.se/cituverkstad/allmant/mjukvara/manualer/flash4_SW.pdf:
size = 2660517
That parts seems fine...
then I run "htmerge -vvv -a" and get a lot of output... for th pdf-file
it is:
202/http://www.citu.lu.se/cituverkstad/allmant/mjukvara/illustrator.htm
Deleted, no excerpt:
209/http://www.citu.lu.se/cituverkstad/allmant/mjukvara/manualer/flash4_SW.pdf
188/http://www.citu.lu.se/cituverkstad/allmant/mjukvara/mediacleaner.html
> Have you tried producing a log from doc2html? - This will report on how many
> bytes of text it has extracted from each file.
No, I haven't tried using logfiles but when I run doc2html manually on
the pdf-file above I get a _large_ html-file that is totally valid html.
> Do the PDF files contain words which are not in your bad words list?
Yes...
Thanx,
P-H
*******************************************************************************
Per-Henrik Persson 0703-68 53 86
[EMAIL PROTECTED] http://www.whatever.nu
"Just because something doesn't work, it doesn't mean it can't be used..."
*******************************************************************************
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html