Thanks for the explanation.

I understand the approach.. but in my particular use case, I cannot reasonably 
justify inflating my application size from 7Mb to 77Mb just to add 
functionality amounting to less than 1% of all functionality.

I guess there’s no way to surgically extract just the PDF metadata parsing 
functionality from Tika ?

 Laurence

> On 26 Sep 2020, at 18:04, Keith Bennett <[email protected]> wrote:
> 
> Tika coordinates the use of many external-to-Tika parser libraries, each
> with their own dependencies, for parsing many types of files. These parser
> libraries are bundled into the tika-app jar file for your convenience. I
> believe it's these libraries that make up the bulk of the download. For
> example, if you unzip the jar file and inspect the contents, you can see
> that just one of these parsers, poi, consists of 24 MB:
> 
> % cd org/apache/poi
> 
> % du -sh .
> 24M .
> 
> - Keith
> 
> On Sat, Sep 26, 2020 at 7:54 AM Laurence Vanhelsuwe <[email protected]>
> wrote:
> 
>> I found Tika during a quest to extract PDF metadata in Java. Did i screw
>> up the JAR download, or is Tika really 70MB ?
>> 
>> Kind regards,
>> 
>>  Laurence
>> 
>> 

Reply via email to