Re: [htdig] Indexing OpenOffice-Documents

berger Fri, 02 Aug 2002 09:07:33 -0700


> > > If you define the script output as text/html instead of text/plain,
> > > then htdig's HTML parser should treat the XML tags as unknown HTML tags
> > > and just ignore them.
> > 
> > I thought that, but didn' want to risk it, thank you, looks like we
> > will have a perfect database at monday! Shall I make a small package
> > for the contrib-section or did I re-invent the wheel?


Setting the output to text/html did not work, only a few OOffice-Docs were parsed 
correctly, but most were not indexed correctly. I removed now all the xml-tags with 
sed and parse it still as plain text. 

I attached a tar with the script and the readme, hope it's usefull

regards

David Berger

NetMon GmbH

sxw2plain.tar
Description: Unix tar archive

Re: [htdig] Indexing OpenOffice-Documents

Reply via email to