IIRC - if you know you only want PDF extraction then take a look at Apache PDFBox. PDFBox.apache.org
Sent from my iPhone > On Sep 26, 2020, at 10:00 AM, Laurence Vanhelsuwe <[email protected]> > wrote: > > Thanks for the explanation. > > I understand the approach.. but in my particular use case, I cannot > reasonably justify inflating my application size from 7Mb to 77Mb just to add > functionality amounting to less than 1% of all functionality. > > I guess there’s no way to surgically extract just the PDF metadata parsing > functionality from Tika ? > > Laurence > >> On 26 Sep 2020, at 18:04, Keith Bennett <[email protected]> wrote: >> >> Tika coordinates the use of many external-to-Tika parser libraries, each >> with their own dependencies, for parsing many types of files. These parser >> libraries are bundled into the tika-app jar file for your convenience. I >> believe it's these libraries that make up the bulk of the download. For >> example, if you unzip the jar file and inspect the contents, you can see >> that just one of these parsers, poi, consists of 24 MB: >> >> % cd org/apache/poi >> >> % du -sh . >> 24M . >> >> - Keith >> >>> On Sat, Sep 26, 2020 at 7:54 AM Laurence Vanhelsuwe >>> <[email protected]> >>> wrote: >>> >>> I found Tika during a quest to extract PDF metadata in Java. Did i screw >>> up the JAR download, or is Tika really 70MB ? >>> >>> Kind regards, >>> >>> Laurence >>> >>> >
