I know that this might have been asked before, but here goes: I've got a bunch of PDF documents that I would like to index. Currently, I'm only trying to index one of them. I have htdig working just fine for HTML documents, and it appears to work fine for the PDF's, but when I do a search, I can't find any of the content in the index. Here's all of the relevant output from rundig -vvv. I'm using the doc2html script that came with htdig to do a pdftotext conversion. I've verified that content extraction is allowed with this PDF - it actaully has no security on it. I've also modified the max_doc_size in htdig.conf to allow for this large document.
Any suggestion as to what I'm doing wrong? +href: http://<hostname>/tsmdrm.pdf (TSM DRM Guide (IBM Redbook)) resolving 'http://<hostname>/tsmdrm.pdf' pushing http://<hostname>/tsmdrm.pdf + size = 667 12983:12983:1:http://<hostname>/tsmdrm.pdf: Retrieval command for http://<hostname>/tsmdrm.pdf: GET /tsmdrm.pdf HTTP/1.0 User-Agent: htdig/3.1.6 ([EMAIL PROTECTED]) Referer: http://<hostname>/books/ Host: <hostname> Header line: HTTP/1.1 200 OK Header line: Date: Wed, 14 Jan 2004 05:43:49 GMT Header line: Server: Apache/1.3.28 (Unix) PHP/4.3.3 Header line: Last-Modified: Wed, 14 Jan 2004 02:42:49 GMT Converted Wed, 14 Jan 2004 02:42:49 GMT to Wed, 14 Jan 2004 02:42:49 Header line: ETag: "1ae685-4bdaa1-4004aca9" Header line: Accept-Ranges: bytes Header line: Content-Length: 4971169 Header line: Connection: close Header line: Content-Type: application/pdf <snip a bunch of garbage> Read a total of 4971169 bytes size = 4971169 12983/http://<hostname>/tsmdrm.pdf ------------------------------------------------------- The SF.Net email is sponsored by EclipseCon 2004 Premiere Conference on Open Tools Development and Integration See the breadth of Eclipse activity. February 3-5 in Anaheim, CA. http://www.eclipsecon.org/osdn _______________________________________________ ht://Dig general mailing list: <[EMAIL PROTECTED]> ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-general

