According to [EMAIL PROTECTED]: > > > > If you define the script output as text/html instead of text/plain, > > > > then htdig's HTML parser should treat the XML tags as unknown HTML tags > > > > and just ignore them. > > > > > > I thought that, but didn' want to risk it, thank you, looks like we > > > will have a perfect database at monday! Shall I make a small package > > > for the contrib-section or did I re-invent the wheel? > > Setting the output to text/html did not work, only a few OOffice-Docs > were parsed correctly, but most were not indexed correctly. I removed > now all the xml-tags with sed and parse it still as plain text. > > I attached a tar with the script and the readme, hope it's usefull > > regards > > David Berger > > NetMon GmbH
Thanks! I extracted the script and readme file, and put them up on http://www.htdig.org/files/contrib/parsers/ -- Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

