The TextMining.org website keeps getting hacked and I don't have the time to upgrade postnuke to a more secure version. Also, because of legal reasons I can't maintain the software. I am more than willing to "hand-off" the project to lucene or someone else. It's an apache 2 license so anyone can branch at anytime and use any license they want. However, if someone wants to take over and gets my seal of approval, I will make the textmining.org home page redirect to your site.
It extracts text from Word documents pretty solidly. If there are problems, they are caused by fast-saved files or files saved with the doc extensions that aren't actually Word documents (rtf, html). Unlike POI, it supports Word 6.0/95 documents. There are many ways it can be improved but they are trivial changes in my opinion. The core logic is solid and is used in commercial/gov't applications. Send me an email directly if you are interested. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]