I sure wish I knew! This is extremely puzzling. And a requirement for me is
parsing of ppt files. Any ideas?

Thanks!

Shawn Coomey
Jr. Systems Administrator / Web Developer
Information Technology
Cubist Pharmaceuticals
65 Hayden Avenue
Lexington, MA 02421
Phone: (781) 860-8508

-----Original Message-----
From: David Adams [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, October 19, 2004 5:01 AM
To: Shawn Coomey; [EMAIL PROTECTED]
Subject: Re: [htdig] Strange results indexing Powerpoint files

I'm impressed, please tell us how you did it!

David Adams
Corporate Information Services
Information Systems Services
University of Southampton

----- Original Message ----- 
From: "Shawn Coomey" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Monday, October 18, 2004 9:10 PM
Subject: [htdig] Strange results indexing Powerpoint files


> Hi folks-
>
> I've just set up ht://dig successfully on a Sun V120 web server (Solaris
8).
> I've got all the external parsers working properly with the exception of
> ppthtml. Apparently what gets indexed (and subsequently shows in search
> results) is not the content of the powerpoint document itself, but the
output
> of the parsing routine!
>
> Below is what is shown in my htdig -vvvv output: (note the "word:"
lines...).
> Also of note: running ppthtml from the command line on the file produces
the
> HTML output I was expecting. Strange indeed.
>
> Any insight would be greatly appreciated!
>
> -Shawn Coomey
>
>
> ~~~~~~~~~~~~~~~~~~~~~
> ./htdig -vvvv output:
>
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> Header line: HTTP/1.1 200 OK
> Header line: Date: Mon, 18 Oct 2004 18:54:25 GMT
> Header line: Server: Apache/1.3.31 (Unix) PHP/4.3.9
> Header line: Last-Modified: Mon, 18 Oct 2004 18:53:54 GMT
> Converted Mon, 18 Oct 2004 18:53:54 GMT to Mon, 18 Oct 2004 18:53:54
> Header line: ETag: "5aa73-fe00-41741142"
> Header line: Accept-Ranges: bytes
> Header line: Content-Length: 65024
> Header line: Connection: close
> Header line: Content-Type: application/vnd.ms-powerpoint
> Header line:
> returnStatus = 0
> Read 8192 from document
> Read 8192 from document
> Read 8192 from document
> Read 8192 from document
> Read 8192 from document
> Read 8192 from document
> Read 8192 from document
> Read 7680 from document
> Read a total of 65024 bytes
>  (changed) word: [EMAIL PROTECTED]
> word: [EMAIL PROTECTED]
> word: [EMAIL PROTECTED]
> word: [EMAIL PROTECTED]
> word: [EMAIL PROTECTED]
> word: [EMAIL PROTECTED]
> word: [EMAIL PROTECTED]
> word: [EMAIL PROTECTED]
> word: [EMAIL PROTECTED]
> word: [EMAIL PROTECTED]
> word: [EMAIL PROTECTED]
> word: [EMAIL PROTECTED]
> word: [EMAIL PROTECTED]
> word: [EMAIL PROTECTED]
> word: [EMAIL PROTECTED]
> word: [EMAIL PROTECTED]
> word: [EMAIL PROTECTED]
> word: [EMAIL PROTECTED]
> word: [EMAIL PROTECTED]
> word: [EMAIL PROTECTED]
> word: [EMAIL PROTECTED]
>
> ...etc, etc etc.
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
>
>
> -------------------------------------------------------
> This SF.net email is sponsored by: IT Product Guide on ITManagersJournal
> Use IT products in your business? Tell us what you think of them. Give us
> Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out
more
> http://productguide.itmanagersjournal.com/guidepromo.tmpl
> _______________________________________________
> ht://Dig general mailing list: <[EMAIL PROTECTED]>
> ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
> List information (subscribe/unsubscribe, etc.)
> https://lists.sourceforge.net/lists/listinfo/htdig-general
>
>



-------------------------------------------------------
This SF.net email is sponsored by: IT Product Guide on ITManagersJournal
Use IT products in your business? Tell us what you think of them. Give us
Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more
http://productguide.itmanagersjournal.com/guidepromo.tmpl
_______________________________________________
ht://Dig general mailing list: <[EMAIL PROTECTED]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

Reply via email to