According to Gustave Stresen-Reuter:
> If I'm not mistaken, since your start_url is a pdf document, it's the
> only document that will get parsed and as far as I know, htdig is unable
> to follow links in a pdf document. Htdig is only able to follow links in
> html documents. Please correct me if I'm wrong on this last statement.
> 
> You'll probably need to create some sort of index document that has
> links to all the pdf files you want to index.
> 
> On Wednesday, October 8, 2003, at 09:19  AM, Natalya Kolesnikova wrote:
> 
> start_url:            
> http://intranet.panasonic.de/pel/ipr/training_course/IPR_books_JPO/introduction_to_IPR.pdf

Natalya set the start_url this way at my recommendation (see earlier
postings in the thread) to rule out whether it's a problem with htdig
being able to actually index PDF files given the URLs, as opposed to
a problem with finding the URLs to the PDFs.  Her test showed that it
failed with a single PDF file, which suggests a problem either with
that PDF file or with the setup of the external parser.  That's the next
stage of testing to tackle.

Once her configuration is working reliably for a single PDF, given the
URL, she'll be in a better position to try and see if it's also having
problems finding the URLs from links in other documents.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba  Winnipeg, MB  R3E 3J7  (Canada)


-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
SourceForge.net hosts over 70,000 Open Source Projects.
See the people who have HELPED US provide better services:
Click here: http://sourceforge.net/supporters.php
_______________________________________________
ht://Dig general mailing list: <[EMAIL PROTECTED]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

Reply via email to