On Fri, 25 Jan 2002, garsila Ndzmande wrote:
> Date: Fri, 25 Jan 2002 23:07:58 -0500
> From: garsila Ndzmande <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]
> Cc: [EMAIL PROTECTED]
> Subject: Re: [htdig] rundig fails during PDF indexing
>
> >This may be the kind of problem that can only be solved on a case by case
> >basis, and unfortunately cannot be isolated without a -v redig;( There
> >are many PDF and MSWord documents, Excel sheets, etc., in our site that
> >we can browse without a problem; however, htdig would spit out error
> >messages during the dig;(
> >
> >For example there is an excel file in our site, about 40K (well under
> >max_doc_size), which would cause rundig to generated an error message,
> >reading something like: malloc could not allocate memory..., but it would
> >not dump core or stop digging. I ended up running htdig with a -v to find
> >the culprit file name, and excluding it from the dig;)
> >
> >Regards,
> >.
> >Joe
>
> Thanks for the reply, running htdig with -vv found several hundreds of PDF
> files in other langauges than english, it also found damaged or protected
> PDF files. I am using multidig to index rather very large organization
> website with many large departments.
> What is the best way to exclude these files? of course adding thier names to
> exclude_urls may not work, the same names could be used somewhere else?
Please do not exclude the mailing list from the Cc.
You should add as much of the URL as necessary to make it unique; e.g. if
you have popular.pdf in:
/Culprit/Path/popular.pdf
/innoce1/Path/popular.pdf
/innoce2/Path/popular.pdf
...
You only need to exclude /Culprit/Path/popular.pdf. All the others will
be unaffected.
> Can the exclude directive reads from a file?
No.
Regards,
Joe
--
_/ _/_/_/ _/ ____________ __o
_/ _/ _/ _/ ______________ _-\<,_
_/ _/ _/_/_/ _/ _/ ......(_)/ (_)
_/_/ oe _/ _/. _/_/ ah [EMAIL PROTECTED]
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html