Tika coordinates the use of many external-to-Tika parser libraries, each
with their own dependencies, for parsing many types of files. These parser
libraries are bundled into the tika-app jar file for your convenience. I
believe it's these libraries that make up the bulk of the download. For
example, if you unzip the jar file and inspect the contents, you can see
that just one of these parsers, poi, consists of 24 MB:

% cd org/apache/poi

% du -sh .
24M .

- Keith

On Sat, Sep 26, 2020 at 7:54 AM Laurence Vanhelsuwe <[email protected]>
wrote:

> I found Tika during a quest to extract PDF metadata in Java. Did i screw
> up the JAR download, or is Tika really 70MB ?
>
> Kind regards,
>
>   Laurence
>
>

Reply via email to