Hi,

On Wed, May 12, 2010 at 3:52 PM, Oleg Tikhonov <[email protected]> wrote:
> 1. From nutch
> http://www.docjar.com/docs/api/org/apache/nutch/parse/rtf/package-index.html

The upstream code used by the nutch parse-rtf module contains LGPL
code which is a bit troublesome for us. See also
https://issues.apache.org/jira/browse/NUTCH-644.

What we could do to avoid the licensing problems is start a separate
"RTF Parser for Tika" project on somewhere like Github or Google Code
that could take over the old LGPL RTF parsing code from etranslate.com
and release a Tika Parser implementation based on that. We couldn't
include that in an official Tika release, but we could point people to
it as an optional component with different licensing conditions.

> 2. OpenOffice writer Java API
> http://wiki.services.openoffice.org/wiki/API/Samples/Java/Writer/TextDocumentStructure

AFAIUI you need a full OpenOffice installation and a separate
OpenOffice process running in background to use the Java API. This is
of course doable, but would probably be too complex for most
deployments.

BR,

Jukka Zitting

Reply via email to