Re: [htdig] DELETED, no excerpt on PDF's

Per-Henrik Persson Tue, 24 Jul 2001 06:59:28 -0700
* David Adams <[EMAIL PROTECTED]> [010724 15:54]:
> With more information we can make better informed guesses.
> 
> Do you ONLY get "no excerpt" with PDF files?



> Are you using doc2html or conv_doc to index other types of document, and are
> they OK?

I only use doc2html to index pdf-files, don't have any other files that
I'm interested in indexing.

> Are you doing a simple run of htdig followed by htmerge, or something more
> complicated, such as merging two or more runs of htdig?

First I run a simple "htdig -v -a -i"... Then I get the usual output
while indexing. For one pdf-file that is:

207:209:7:http://www.citu.lu.se/cituverkstad/allmant/mjukvara/manualer/flash4_SW.pdf:  
 size = 2660517

That parts seems fine...

then I run "htmerge -vvv -a" and get a lot of output... for th pdf-file
it is:

202/http://www.citu.lu.se/cituverkstad/allmant/mjukvara/illustrator.htm
Deleted, no excerpt: 
209/http://www.citu.lu.se/cituverkstad/allmant/mjukvara/manualer/flash4_SW.pdf
188/http://www.citu.lu.se/cituverkstad/allmant/mjukvara/mediacleaner.html


> Have you tried producing a log from doc2html? - This will report on how many
> bytes of text it has extracted from each file.

No, I haven't tried using logfiles but when I run doc2html manually on
the pdf-file above I get a _large_ html-file that is totally valid html.

> Do the PDF files contain words which are not in your bad words list?

Yes...

Thanx,

P-H

******************************************************************************* 
Per-Henrik Persson                          0703-68 53 86
[EMAIL PROTECTED]                              http://www.whatever.nu

"Just because something doesn't work, it doesn't mean it can't be used..."
*******************************************************************************

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html
Re: [htdig] DELETED, no excerpt on PDF's

Reply via email to