David, thanks. That was precisely what I was looking for.
Steve. David Adams wrote: > Start by using the utilities that you have already got: > > Don't bother with wp2html for Word 2000, use catdoc > Use the pdf2html.pl wrapper script with pdftotext and pdfinfo > > Then go to the www.xlHtml.org site and download xlhtml (pptHtml is part of > the download). > > Later, when you are happy with the job htdig and the converters are doing: > > Upgrade to pdftotext and pdfinfo to xpdf v1.0 if you havn't already, > it's wellworth the trouble. > Consider purchasing wp2html to give you improved indexing of Word 2000 > documents. > Download the swfparser code and install with the swf2html.pl wrapper > script. > > Note that swfparser does NOT extract text from Shockwave Flash files, only > links. > So you cannot index them, but it may be important on some sites to be able > to follow > the links which are embedded in them. > > -- > David Adams > Computing Services > Southampton University > > > ----- Original Message ----- > From: "Steve Burton" <[EMAIL PROTECTED]> > To: <[EMAIL PROTECTED]> > Sent: Thursday, April 11, 2002 9:28 AM > Subject: [htdig] Recommended parser set > > > >>Hi, >> >>I'm just starting using htdig (3.1.6) to index our new company intranet >>and it works (it's brilliant, in fact but enough crawling)! >> >>At the moment I'm using conv_doc.pl with catdoc, pdftotext and pdfinfo >>as external parsers but I would like to extend the number of document >>types I can handle. I downloaded doc2html and read the docs. and now I'm >>confused (too much choice). Can anyone recommend a parser set that >>works? My priorities are Word 2000, PDF, Excel, PowerPoint and Flash >>(with Flash very low on my list. >> >>Thanks, >> >>Steve. >> >> >>_______________________________________________ >>htdig-general mailing list <[EMAIL PROTECTED]> >>To unsubscribe, send a message to >> > <[EMAIL PROTECTED]> with a subject of unsubscribe > >>FAQ: http://htdig.sourceforge.net/FAQ.html >> >> > > > _______________________________________________ > htdig-general mailing list <[EMAIL PROTECTED]> > To unsubscribe, send a message to <[EMAIL PROTECTED]> with >a subject of unsubscribe > FAQ: http://htdig.sourceforge.net/FAQ.html > > _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

