Re: [htdig] Indexing OpenOffice-Documents

Gilles Detillieux Fri, 02 Aug 2002 09:19:54 -0700

According to [EMAIL PROTECTED]:
> > > > If you define the script output as text/html instead of text/plain,
> > > > then htdig's HTML parser should treat the XML tags as unknown HTML tags
> > > > and just ignore them.
> > > 
> > > I thought that, but didn' want to risk it, thank you, looks like we
> > > will have a perfect database at monday! Shall I make a small package
> > > for the contrib-section or did I re-invent the wheel?
> 
> Setting the output to text/html did not work, only a few OOffice-Docs
> were parsed correctly, but most were not indexed correctly. I removed
> now all the xml-tags with sed and parse it still as plain text.
> 
> I attached a tar with the script and the readme, hope it's usefull
> 
> regards
> 
> David Berger
> 
> NetMon GmbH


Thanks!  I extracted the script and readme file, and put them up on
http://www.htdig.org/files/contrib/parsers/

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba  Winnipeg, MB  R3E 3J7  (Canada)


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Re: [htdig] Indexing OpenOffice-Documents

Reply via email to