When I run htmerge, I get follow message: htmerge: Document database has no URLs. Check your config file and try running htdig again.
Thank you for your tipps! Natalya > Thank you very much for your help! > I don't get error message, but I have never .pdf-Files in my > search-List!!! > Hier is htdig -ivvv output when start_url is a single PDF file. > What is wrong??? > > [EMAIL PROTECTED]:~> htdig -ivvv > > 1:1:http://intranet.panasonic.de/pel/ipr/training_course/IPR_books_JPO/i > ntroduction_to_IPR.pdf > New server: intranet.panasonic.de, 80 > Retrieval command for http://intranet.panasonic.de/robots.txt: GET > /robots.txt H > TTP/1.0 > User-Agent: htdig/3.1.6 ([EMAIL PROTECTED]) > Host: intranet.panasonic.de > > Header line: HTTP/1.1 200 OK > Header line: Date: Wed, 08 Oct 2003 08:36:24 GMT > Header line: Server: Apache/1.3.27 (Linux/SuSE) PHP/4.3.1 > Header line: Last-Modified: Tue, 21 Aug 2001 22:00:00 GMT > Converted Tue, 21 Aug 2001 22:00:00 GMT to Tue, 21 Aug 2001 22:00:00 > Header line: ETag: "44005-e7-3b82d9e0" > Header line: Accept-Ranges: bytes > Header line: Content-Length: 231 > Header line: Connection: close > Header line: Content-Type: text/plain > Header line: > returnStatus = 0 > Read 231 from document > Read a total of 231 bytes > Parsing robots.txt file using myname = htdig > Robots.txt line: # exclude help system from robots > Robots.txt line: User-agent: * > Found 'user-agent' line: * > Robots.txt line: Disallow: /manual/ > Found 'disallow' line: /manual/ > Robots.txt line: Disallow: /doc/ > Found 'disallow' line: /doc/ > Robots.txt line: Disallow: /gif/ > Found 'disallow' line: /gif/ > Robots.txt line: # but allow htdig to index our doc-tree > Robots.txt line: User-agent: susedig > Found 'user-agent' line: susedig > Robots.txt line: Disallow: > Robots.txt line: # disallow stress test > Robots.txt line: user-agent: stress-agent > Found 'user-agent' line: stress-agent > Robots.txt line: Disallow: / > Pattern: /manual/|/doc/|/gif/ > pushed > pick: intranet.panasonic.de, # servers = > 1 > 0:0:0:http://intranet.panasonic.de/pel/ipr/training_course/IPR_books_JPO/introdu > ction_to_IPR.pdf: Retrieval command for > http://intranet.panasonic.de/pel/ipr/tra > ining_course/IPR_books_JPO/introduction_to_IPR.pdf: GET > /pel/ipr/training_course > /IPR_books_JPO/introduction_to_IPR.pdf HTTP/1.0 > User-Agent: htdig/3.1.6 ([EMAIL PROTECTED]) > Host: intranet.panasonic.de > > Header line: HTTP/1.1 200 OK > Header line: Date: Wed, 08 Oct 2003 08:36:24 GMT > Header line: Server: Apache/1.3.27 (Linux/SuSE) PHP/4.3.1 > Header line: Last-Modified: Fri, 29 Aug 2003 11:25:19 GMT > Converted Fri, 29 Aug 2003 11:25:19 GMT to Fri, 29 Aug 2003 11:25:19 > Header line: ETag: "314005-51e38-3f4f381f" > Header line: Accept-Ranges: bytes > Header line: Content-Length: 335416 > Header line: Connection: close > Header line: Content-Type: application/pdf > Header line: > returnStatus = 0 > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 7736 from document > Read a total of 335416 bytes > size = 335416 > pick: intranet.panasonic.de, # servers = 1 > [EMAIL PROTECTED]:~> > > According to Natalya Kolesnikova: > > > may be I am stupid, but it doesn't work by me! Can somebody help me? I > > have > > > tried with acroread and with external parser xpdf, but it doesn't > > work!!!! > > > I need the Installation Guide!!! :))) > > > > See http://www.htdig.org/FAQ.html#q4.9 > > > > That is the installation guide for PDF indexing. If you've carefully > read > > and implemented everything recommended there, and checked out FAQs 5.2 > > and 5.37 as David recommended (twice), then please provide more details, > > such as what error messages you get, or give us an excerpt of htdig > -ivvv > > output when start_url is set to point to just one single PDF file. > > > > There are dozens of potential points of failure in this process, so > simply > > saying "it doesn't work" gives us no information that can help pinpoint > > which point of failure is the one that needs to be addressed. > > > > Also, make sure you have links in your HTML files to all PDF files you > > want to index. (See http://www.htdig.org/FAQ.html#q5.25) > > > > -- > > Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]> > > Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ > > Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) > > > > > > ------------------------------------------------------- > > This sf.net email is sponsored by:ThinkGeek > > Welcome to geek heaven. > > http://thinkgeek.com/sf > > _______________________________________________ > > ht://Dig general mailing list: <[EMAIL PROTECTED]> > > ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html > > List information (subscribe/unsubscribe, etc.) > > https://lists.sourceforge.net/lists/listinfo/htdig-general > > > > > > -- > NEU F�R ALLE - GMX MediaCenter - f�r Fotos, Musik, Dateien... > Fotoalbum, File Sharing, MMS, Multimedia-Gru�, GMX FotoService > > Jetzt kostenlos anmelden unter http://www.gmx.net > > +++ GMX - die erste Adresse f�r Mail, Message, More! +++ > > > > ------------------------------------------------------- > This sf.net email is sponsored by:ThinkGeek > Welcome to geek heaven. > http://thinkgeek.com/sf > _______________________________________________ > ht://Dig general mailing list: <[EMAIL PROTECTED]> > ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html > List information (subscribe/unsubscribe, etc.) > https://lists.sourceforge.net/lists/listinfo/htdig-general > -- NEU F�R ALLE - GMX MediaCenter - f�r Fotos, Musik, Dateien... Fotoalbum, File Sharing, MMS, Multimedia-Gru�, GMX FotoService Jetzt kostenlos anmelden unter http://www.gmx.net +++ GMX - die erste Adresse f�r Mail, Message, More! +++ ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ ht://Dig general mailing list: <[EMAIL PROTECTED]> ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-general

