Thanks for the explanation. I understand the approach.. but in my particular use case, I cannot reasonably justify inflating my application size from 7Mb to 77Mb just to add functionality amounting to less than 1% of all functionality.
I guess there’s no way to surgically extract just the PDF metadata parsing functionality from Tika ? Laurence > On 26 Sep 2020, at 18:04, Keith Bennett <[email protected]> wrote: > > Tika coordinates the use of many external-to-Tika parser libraries, each > with their own dependencies, for parsing many types of files. These parser > libraries are bundled into the tika-app jar file for your convenience. I > believe it's these libraries that make up the bulk of the download. For > example, if you unzip the jar file and inspect the contents, you can see > that just one of these parsers, poi, consists of 24 MB: > > % cd org/apache/poi > > % du -sh . > 24M . > > - Keith > > On Sat, Sep 26, 2020 at 7:54 AM Laurence Vanhelsuwe <[email protected]> > wrote: > >> I found Tika during a quest to extract PDF metadata in Java. Did i screw >> up the JAR download, or is Tika really 70MB ? >> >> Kind regards, >> >> Laurence >> >>
