According to sudhish_c:
> hai ....
>        i am sudhish and i am trying to use external parser catdoc for
> msword documents..i am using cygwin platform..when i try to compile ,
> its showing as if it has detected word document and converted,by if
> i make a search on that.its not happening , i am getting the message
> "no matches found"
> i am attaching my configured conv_doc.pl and compilation.txt which
> shows the content after executing the command "sh rundig -vvv" pls go
> thru this and revert to the fault i did..

First of all, you should always direct questions like this to the
htdig-general list, rather than to me (or any other ht://Dig developer).
See http://www.htdig.org/FAQ.html#q1.16

Looking at the attachments you sent, I see two problems immediately,
both of which would prevent htdig from calling the external converter,
conv_doc.pl, for your MS Word ".doc" files.  You need to fix both of
these problems.

1) In your htdig.conf, the external_parsers attribute is not entered
correctly:

...
external_parsers:application/pdf->text/html 
"D:/JavaWebServer2.0/cgi-bin/Perl/5.6.0/bin/MSWin32-x86/perl5.6.0.exe 
D:/JavaWebServer2.0/cgi-bin/htdig-3.1.5/bin/conv_doc.pl"
\ application/msword->text/html 
"D:/JavaWebServer2.0/cgi-bin/Perl/5.6.0/bin/MSWin32-x86/perl5.6.0.exe 
D:/JavaWebServer2.0/cgi-bin/htdig-3.1.5/bin/conv_doc.pl"
...

The "\" must be at the very end of the first line, not at the start of
the second.  This is the standard convention for "continuation lines"
in htdig, as in many UNIX-based utilities.  The way you have it, the
second definition will not be part of external_parsers, and so htdig
would not call conv_doc.pl even if it did see a file with a Content-Type
of application/msword.  You must either put the entire definition on a
single line, or if you break it up into multiple lines you must end all
lines other than the last one with the backslash "\" character.

2) In your "compilation.txt" file, it is quite clear that your web server
is returning a content type of "text/plain", and not "application/msword"
for *.doc files:

...
2:2:1:http://localhost:8080/dig.doc: Retrieval command for 
http://localhost:8080/dig.doc: GET /dig.doc HTTP/1.0
User-Agent: htdig/3.1.5 ([EMAIL PROTECTED])
Referer: http://localhost:8080/
Host: localhost

Header line: HTTP/1.1 200 OK
Header line: Server: JavaWebServer/2.0
Header line: Content-Length: 23040
Header line: Content-Type: text/plain
Header line: Last-Modified: Thu, 14 Dec 2000 10:23:46 GMT
...

You need to reconfigure your JavaWebServer to return the desired
Content-Type header.  In Apache, this would be done by either adding a
definition to your mime.types file, or adding an "AddType" directive in
your httpd.conf file.  I'm not familiar with JavaWebServer, so you'll
have to read the documentation for it to find out how to define MIME
types on your system.

If you still have problems after fixing both of the problems above, try
running conv_doc.pl manually on one of your .doc files and see if the
output is readable text.  You can also run htdig or rundig with one more
-v option, i.e. -vvvv, to get more output including a list of every word
parsed from each document.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to