According to sudhish_c: > hai .... > i am sudhish and i am trying to use external parser catdoc for > msword documents..i am using cygwin platform..when i try to compile , > its showing as if it has detected word document and converted,by if > i make a search on that.its not happening , i am getting the message > "no matches found" > i am attaching my configured conv_doc.pl and compilation.txt which > shows the content after executing the command "sh rundig -vvv" pls go > thru this and revert to the fault i did.. First of all, you should always direct questions like this to the htdig-general list, rather than to me (or any other ht://Dig developer). See http://www.htdig.org/FAQ.html#q1.16 Looking at the attachments you sent, I see two problems immediately, both of which would prevent htdig from calling the external converter, conv_doc.pl, for your MS Word ".doc" files. You need to fix both of these problems. 1) In your htdig.conf, the external_parsers attribute is not entered correctly: ... external_parsers:application/pdf->text/html "D:/JavaWebServer2.0/cgi-bin/Perl/5.6.0/bin/MSWin32-x86/perl5.6.0.exe D:/JavaWebServer2.0/cgi-bin/htdig-3.1.5/bin/conv_doc.pl" \ application/msword->text/html "D:/JavaWebServer2.0/cgi-bin/Perl/5.6.0/bin/MSWin32-x86/perl5.6.0.exe D:/JavaWebServer2.0/cgi-bin/htdig-3.1.5/bin/conv_doc.pl" ... The "\" must be at the very end of the first line, not at the start of the second. This is the standard convention for "continuation lines" in htdig, as in many UNIX-based utilities. The way you have it, the second definition will not be part of external_parsers, and so htdig would not call conv_doc.pl even if it did see a file with a Content-Type of application/msword. You must either put the entire definition on a single line, or if you break it up into multiple lines you must end all lines other than the last one with the backslash "\" character. 2) In your "compilation.txt" file, it is quite clear that your web server is returning a content type of "text/plain", and not "application/msword" for *.doc files: ... 2:2:1:http://localhost:8080/dig.doc: Retrieval command for http://localhost:8080/dig.doc: GET /dig.doc HTTP/1.0 User-Agent: htdig/3.1.5 ([EMAIL PROTECTED]) Referer: http://localhost:8080/ Host: localhost Header line: HTTP/1.1 200 OK Header line: Server: JavaWebServer/2.0 Header line: Content-Length: 23040 Header line: Content-Type: text/plain Header line: Last-Modified: Thu, 14 Dec 2000 10:23:46 GMT ... You need to reconfigure your JavaWebServer to return the desired Content-Type header. In Apache, this would be done by either adding a definition to your mime.types file, or adding an "AddType" directive in your httpd.conf file. I'm not familiar with JavaWebServer, so you'll have to read the documentation for it to find out how to define MIME types on your system. If you still have problems after fixing both of the problems above, try running conv_doc.pl manually on one of your .doc files and see if the output is readable text. You can also run htdig or rundig with one more -v option, i.e. -vvvv, to get more output including a list of every word parsed from each document. -- Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

