David,

thanks. That was precisely what I was looking for.

Steve.

David Adams wrote:

> Start by using the utilities that you have already got:
> 
>     Don't bother with wp2html for Word 2000, use catdoc
>     Use the pdf2html.pl wrapper script with pdftotext and pdfinfo
> 
> Then go to the www.xlHtml.org site and download xlhtml (pptHtml is part of
> the download).
> 
> Later, when you are happy with the job htdig and the converters are doing:
> 
>     Upgrade to pdftotext and pdfinfo to xpdf v1.0 if you havn't already,
> it's wellworth the trouble.
>     Consider purchasing wp2html to give you improved indexing of Word 2000
> documents.
>     Download the swfparser code and install with the swf2html.pl wrapper
> script.
> 
> Note that swfparser does NOT extract text from Shockwave Flash files, only
> links.
> So you cannot index them, but it may be important on some sites to be able
> to follow
> the links which are embedded in them.
> 
> --
> David Adams
> Computing Services
> Southampton University
> 
> 
> ----- Original Message -----
> From: "Steve Burton" <[EMAIL PROTECTED]>
> To: <[EMAIL PROTECTED]>
> Sent: Thursday, April 11, 2002 9:28 AM
> Subject: [htdig] Recommended parser set
> 
> 
> 
>>Hi,
>>
>>I'm just starting using htdig (3.1.6) to index our new company intranet
>>and it works (it's brilliant, in fact but enough crawling)!
>>
>>At the moment I'm using conv_doc.pl with catdoc, pdftotext and pdfinfo
>>as external parsers but I would like to extend the number of document
>>types I can handle. I downloaded doc2html and read the docs. and now I'm
>>confused (too much choice). Can anyone recommend a parser set that
>>works? My priorities are Word 2000, PDF, Excel, PowerPoint and Flash
>>(with Flash very low on my list.
>>
>>Thanks,
>>
>>Steve.
>>
>>
>>_______________________________________________
>>htdig-general mailing list <[EMAIL PROTECTED]>
>>To unsubscribe, send a message to
>>
> <[EMAIL PROTECTED]> with a subject of unsubscribe
> 
>>FAQ: http://htdig.sourceforge.net/FAQ.html
>>
>>
> 
> 
> _______________________________________________
> htdig-general mailing list <[EMAIL PROTECTED]>
> To unsubscribe, send a message to <[EMAIL PROTECTED]> with 
>a subject of unsubscribe
> FAQ: http://htdig.sourceforge.net/FAQ.html
> 
> 



_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to