According to meista knut:
> I�m having some problems indexing some powerpoint and excel-files.
>                                                                              
> For  example  I have the powerpoint file file1.ppt when running rundig
> -v it says not HTML.
>                                                                              
> When I save the same file as file1.pps it shows me the correct size of
> the  file  but  there  are  no  -,  + or *. And when I search for some
> special  words,  which are in the powerpoint file, via htsearch, there
> is no match.
>                                                                              
> 19:19:2:http://my.dom.ain/.../file1.pps:  size = 437248
> 20:20:2:http://my.dom.ain/.../file1.ppt:  not HTML
>                                                                              
> I  have  doc2html.pl installed and it runs with every word file (.doc)
> and as parser I have xlhtml and ppthtml.
>                                                                              
> Even when I run xlhtml respectively ppthtml via console; it shows me a
> correct HTML-page.
>                                                                              
> What is wrong with my configuration please help me.

Well, all I can say to that is what is your configuration?  Without
seeing your whole external_parsers attribute setting, about all I can
do is guess.  The most likely problem is that whatever your web server
is returning as Content-Type for *.ppt files doesn't match what you've
entered in your external_parsers definition.

As for there being no -, + or * listed, that's normal, as these indicated
what htdig is doing with hypertext links it finds in a document.  It
normally won't get any of these from any external converter or parser, but
typically only from HTML files.  See http://www.htdig.org/FAQ.html#q5.26

To really get a handle on what htdig is doing, you'll need more
verbose output.  To get headers listed in the output, which you'll
need to look at to figure this out, you should use at least -vvv
(see http://www.htdig.org/FAQ.html#q4.1) to get server responses.
This will generate a lot of output, but you can bring that under control
by temporarily setting start_url to just the URL(s) of one or a few
.ppt files.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba  Winnipeg, MB  R3E 3J7  (Canada)


-------------------------------------------------------
This SF.net email is sponsored by: The SF.net Donation Program.
Do you like what SourceForge.net is doing for the Open
Source Community?  Make a contribution, and help us add new
features and functionality. Click here: http://sourceforge.net/donate/
_______________________________________________
ht://Dig general mailing list: <[EMAIL PROTECTED]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

Reply via email to