> > > If you define the script output as text/html instead of text/plain, > > > then htdig's HTML parser should treat the XML tags as unknown HTML tags > > > and just ignore them. > > > > I thought that, but didn' want to risk it, thank you, looks like we > > will have a perfect database at monday! Shall I make a small package > > for the contrib-section or did I re-invent the wheel?
Setting the output to text/html did not work, only a few OOffice-Docs were parsed correctly, but most were not indexed correctly. I removed now all the xml-tags with sed and parse it still as plain text. I attached a tar with the script and the readme, hope it's usefull regards David Berger NetMon GmbH
sxw2plain.tar
Description: Unix tar archive

