Shawn,

>From the information provided the only possible answer is "do it for .ppt
files in the same way that you have succeeded with other external parsers".

PLEASE: tell us what you have done!  What method are you using?  Give us the
external_parsers: statement from your config file.  What external parsers
are you using?  Give us enough information to be able to help you.

David Adams
Corporate Information Services
Information Systems Services
University of Southampton

----- Original Message ----- 
From: "Shawn Coomey" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Tuesday, October 19, 2004 2:49 PM
Subject: RE: [htdig] Strange results indexing Powerpoint files


>
> I sure wish I knew! This is extremely puzzling. And a requirement for me
is
> parsing of ppt files. Any ideas?
>
> Thanks!
>
> Shawn Coomey
> Jr. Systems Administrator / Web Developer
> Information Technology
> Cubist Pharmaceuticals
> 65 Hayden Avenue
> Lexington, MA 02421
> Phone: (781) 860-8508
>
> -----Original Message-----
> From: David Adams [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, October 19, 2004 5:01 AM
> To: Shawn Coomey; [EMAIL PROTECTED]
> Subject: Re: [htdig] Strange results indexing Powerpoint files
>
> I'm impressed, please tell us how you did it!
>
> David Adams
> Corporate Information Services
> Information Systems Services
> University of Southampton
>
> ----- Original Message ----- 
> From: "Shawn Coomey" <[EMAIL PROTECTED]>
> To: <[EMAIL PROTECTED]>
> Sent: Monday, October 18, 2004 9:10 PM
> Subject: [htdig] Strange results indexing Powerpoint files
>
>
> > Hi folks-
> >
> > I've just set up ht://dig successfully on a Sun V120 web server (Solaris
> 8).
> > I've got all the external parsers working properly with the exception of
> > ppthtml. Apparently what gets indexed (and subsequently shows in search
> > results) is not the content of the powerpoint document itself, but the
> output
> > of the parsing routine!
> >
> > Below is what is shown in my htdig -vvvv output: (note the "word:"
> lines...).
> > Also of note: running ppthtml from the command line on the file produces
> the
> > HTML output I was expecting. Strange indeed.
> >
> > Any insight would be greatly appreciated!
> >
> > -Shawn Coomey
> >
> >
> > ~~~~~~~~~~~~~~~~~~~~~
> > ./htdig -vvvv output:
> >
>
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > Header line: HTTP/1.1 200 OK
> > Header line: Date: Mon, 18 Oct 2004 18:54:25 GMT
> > Header line: Server: Apache/1.3.31 (Unix) PHP/4.3.9
> > Header line: Last-Modified: Mon, 18 Oct 2004 18:53:54 GMT
> > Converted Mon, 18 Oct 2004 18:53:54 GMT to Mon, 18 Oct 2004 18:53:54
> > Header line: ETag: "5aa73-fe00-41741142"
> > Header line: Accept-Ranges: bytes
> > Header line: Content-Length: 65024
> > Header line: Connection: close
> > Header line: Content-Type: application/vnd.ms-powerpoint
> > Header line:
> > returnStatus = 0
> > Read 8192 from document
> > Read 8192 from document
> > Read 8192 from document
> > Read 8192 from document
> > Read 8192 from document
> > Read 8192 from document
> > Read 8192 from document
> > Read 7680 from document
> > Read a total of 65024 bytes
> >  (changed) word: [EMAIL PROTECTED]
> > word: [EMAIL PROTECTED]
> > word: [EMAIL PROTECTED]
> > word: [EMAIL PROTECTED]
> > word: [EMAIL PROTECTED]
> > word: [EMAIL PROTECTED]
> > word: [EMAIL PROTECTED]
> > word: [EMAIL PROTECTED]
> > word: [EMAIL PROTECTED]
> > word: [EMAIL PROTECTED]
> > word: [EMAIL PROTECTED]
> > word: [EMAIL PROTECTED]
> > word: [EMAIL PROTECTED]
> > word: [EMAIL PROTECTED]
> > word: [EMAIL PROTECTED]
> > word: [EMAIL PROTECTED]
> > word: [EMAIL PROTECTED]
> > word: [EMAIL PROTECTED]
> > word: [EMAIL PROTECTED]
> > word: [EMAIL PROTECTED]
> > word: [EMAIL PROTECTED]
> >
> > ...etc, etc etc.
> >
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >
> >
> >
> > -------------------------------------------------------
> > This SF.net email is sponsored by: IT Product Guide on ITManagersJournal
> > Use IT products in your business? Tell us what you think of them. Give
us
> > Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out
> more
> > http://productguide.itmanagersjournal.com/guidepromo.tmpl
> > _______________________________________________
> > ht://Dig general mailing list: <[EMAIL PROTECTED]>
> > ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
> > List information (subscribe/unsubscribe, etc.)
> > https://lists.sourceforge.net/lists/listinfo/htdig-general
> >
> >
>
>
>
> -------------------------------------------------------
> This SF.net email is sponsored by: IT Product Guide on ITManagersJournal
> Use IT products in your business? Tell us what you think of them. Give us
> Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out
more
> http://productguide.itmanagersjournal.com/guidepromo.tmpl
> _______________________________________________
> ht://Dig general mailing list: <[EMAIL PROTECTED]>
> ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
> List information (subscribe/unsubscribe, etc.)
> https://lists.sourceforge.net/lists/listinfo/htdig-general
>
>



-------------------------------------------------------
This SF.net email is sponsored by: IT Product Guide on ITManagersJournal
Use IT products in your business? Tell us what you think of them. Give us
Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more
http://productguide.itmanagersjournal.com/guidepromo.tmpl
_______________________________________________
ht://Dig general mailing list: <[EMAIL PROTECTED]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

Reply via email to