I know that this might have been asked before, but here goes:

I've got a bunch of PDF documents that I would like to index.  Currently,
I'm only trying to index one of them.  I have htdig working just fine for
HTML documents, and it appears to work fine for the PDF's, but when I do a
search, I can't find any of the content in the index.  Here's all of the
relevant output from rundig -vvv.  I'm using the doc2html script that came
with htdig to do a pdftotext conversion.  I've verified that content
extraction is allowed with this PDF - it actaully has no security on it.
I've also modified the max_doc_size in htdig.conf to allow for this large
document.

Any suggestion as to what I'm doing wrong?

+href: http://<hostname>/tsmdrm.pdf (TSM DRM Guide (IBM Redbook))
resolving 'http://<hostname>/tsmdrm.pdf'

   pushing http://<hostname>/tsmdrm.pdf
+ size = 667
12983:12983:1:http://<hostname>/tsmdrm.pdf: Retrieval command for
http://<hostname>/tsmdrm.pdf: GET /tsmdrm.pdf HTTP/1.0
User-Agent: htdig/3.1.6 ([EMAIL PROTECTED])
Referer: http://<hostname>/books/
Host: <hostname>

Header line: HTTP/1.1 200 OK
Header line: Date: Wed, 14 Jan 2004 05:43:49 GMT
Header line: Server: Apache/1.3.28 (Unix) PHP/4.3.3
Header line: Last-Modified: Wed, 14 Jan 2004 02:42:49 GMT
Converted Wed, 14 Jan 2004 02:42:49 GMT to Wed, 14 Jan 2004 02:42:49
Header line: ETag: "1ae685-4bdaa1-4004aca9"
Header line: Accept-Ranges: bytes
Header line: Content-Length: 4971169
Header line: Connection: close
Header line: Content-Type: application/pdf
<snip a bunch of garbage>
Read a total of 4971169 bytes
 size = 4971169


12983/http://<hostname>/tsmdrm.pdf



-------------------------------------------------------
The SF.Net email is sponsored by EclipseCon 2004
Premiere Conference on Open Tools Development and Integration
See the breadth of Eclipse activity. February 3-5 in Anaheim, CA.
http://www.eclipsecon.org/osdn
_______________________________________________
ht://Dig general mailing list: <[EMAIL PROTECTED]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

Reply via email to