Tika coordinates the use of many external-to-Tika parser libraries, each with their own dependencies, for parsing many types of files. These parser libraries are bundled into the tika-app jar file for your convenience. I believe it's these libraries that make up the bulk of the download. For example, if you unzip the jar file and inspect the contents, you can see that just one of these parsers, poi, consists of 24 MB:
% cd org/apache/poi % du -sh . 24M . - Keith On Sat, Sep 26, 2020 at 7:54 AM Laurence Vanhelsuwe <[email protected]> wrote: > I found Tika during a quest to extract PDF metadata in Java. Did i screw > up the JAR download, or is Tika really 70MB ? > > Kind regards, > > Laurence > >
