Dear Sir or Madam,
to me the task was assigned to aquaint myself with htdig in order to set up
an intranet search engine that merely can be accessed by password and that
indexes every local document in our network.
We decided in favor of HT://DIG.
I'm wrestling with the following problem for about two weeks, each day:
I'm using SUSE 9.1, now htdig-3.2.0b6, and Apache 2.0.53 (xampp)
Everything's working fine with HTML and TEXT Documents but with DOC- and
PDF-Files it fails.
I installed CATDOC for parsing the msword-files and acrobat for pdf.
Lasting for hours, I gorged every kind of information I was able to get in
the forums and faqs relating my problem, but nothing seems to work.
I'm definetly sure my htdig.conf is absolutely right and my doc2html.pl,
too. Evoking the perlscript at the command line works fine, and the verbose
mode of rundig tells me that apache passes the correct MIME-type for each
file!
To me, it seems that for any reason I'm not able to indicate, the external
parser doesn't return any text or is not called up the right way -> see
what 'rundig -vvv' gives:
"
.
.
.
href: http://localhost/test/test.doc (___-= TEST - MSWORD-FILE =-___ )
resolving 'http://localhost/test/test.doc'
pushing http://localhost/test/test.doc
+Tag: <br>, matched -1
Tag: <br>, matched -1
Tag: </div>, matched -1
Tag: <hr noshade size="4">, matched -1
word: [EMAIL PROTECTED]
word: [EMAIL PROTECTED]
word: [EMAIL PROTECTED]
word: [EMAIL PROTECTED]
word: [EMAIL PROTECTED]
word: [EMAIL PROTECTED]
word: [EMAIL PROTECTED]
word: [EMAIL PROTECTED]
word: [EMAIL PROTECTED]
word: [EMAIL PROTECTED]
word: [EMAIL PROTECTED]
word: [EMAIL PROTECTED]
word: [EMAIL PROTECTED]
word: [EMAIL PROTECTED]
word: [EMAIL PROTECTED]
word: [EMAIL PROTECTED]
Tag: <br>, matched -1
.
.
.
Header line: HTTP/1.1 200 OK
Header line: Date: Tue, 05 Apr 2005 13:09:19 GMT
Header line: Server: Apache/2.0.53 (Unix) mod_ssl/2.0.53 OpenSSL/0.9.7d
PHP/5.0.3 DAV/2 mod_perl/1.999.21 Perl/v5.8.6
Header line: Last-Modified: Fri, 13 Sep 2002 10:25:48 GMT
Converted Fri, 13 Sep 2002 10:25:48 GMT to Fri, 13 Sep 2002 10:25:48
Header line: ETag: "1c6e6-7e00-e8c8a300"
Header line: Accept-Ranges: bytes
Header line: Content-Length: 32256
Header line: Connection: close
Header line: Content-Type: application/msword
not HTML
pick: localhost, # servers = 1
.
.
.
htmerge: Sorting...
htmerge: Removing doc #1
htmerge: Removing doc #3
htmerge: Merging...
htmerge: Discarding docfile in doc #1
htmerge: Discarding mswordfile in doc #3
htmerge: Discarding test in doc #3
.
.
.
Deleted, no excerpt: 3/http://localhost/test/test.doc
"
As I'm inching towards despair, any help would be appreciated!
Thank you!

Best regards,
 Jeremy Prasetyo




-------------------------------------------------------
SF.Net email is sponsored by: Tell us your software development plans!
Take this survey and enter to win a one-year sub to SourceForge.net
Plus IDC's 2005 look-ahead and a copy of this survey
Click here to start!  http://www.idcswdc.com/cgi-bin/survey?id=105hix
_______________________________________________
ht://Dig general mailing list: <[email protected]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

Reply via email to