Hi, On Wed, May 12, 2010 at 3:52 PM, Oleg Tikhonov <[email protected]> wrote: > 1. From nutch > http://www.docjar.com/docs/api/org/apache/nutch/parse/rtf/package-index.html
The upstream code used by the nutch parse-rtf module contains LGPL code which is a bit troublesome for us. See also https://issues.apache.org/jira/browse/NUTCH-644. What we could do to avoid the licensing problems is start a separate "RTF Parser for Tika" project on somewhere like Github or Google Code that could take over the old LGPL RTF parsing code from etranslate.com and release a Tika Parser implementation based on that. We couldn't include that in an official Tika release, but we could point people to it as an optional component with different licensing conditions. > 2. OpenOffice writer Java API > http://wiki.services.openoffice.org/wiki/API/Samples/Java/Writer/TextDocumentStructure AFAIUI you need a full OpenOffice installation and a separate OpenOffice process running in background to use the Java API. This is of course doable, but would probably be too complex for most deployments. BR, Jukka Zitting
