IIRC - if you know you only want PDF extraction then take a look at Apache 
PDFBox. PDFBox.apache.org

Sent from my iPhone

> On Sep 26, 2020, at 10:00 AM, Laurence Vanhelsuwe <[email protected]> 
> wrote:
> 
> Thanks for the explanation.
> 
> I understand the approach.. but in my particular use case, I cannot 
> reasonably justify inflating my application size from 7Mb to 77Mb just to add 
> functionality amounting to less than 1% of all functionality.
> 
> I guess there’s no way to surgically extract just the PDF metadata parsing 
> functionality from Tika ?
> 
> Laurence
> 
>> On 26 Sep 2020, at 18:04, Keith Bennett <[email protected]> wrote:
>> 
>> Tika coordinates the use of many external-to-Tika parser libraries, each
>> with their own dependencies, for parsing many types of files. These parser
>> libraries are bundled into the tika-app jar file for your convenience. I
>> believe it's these libraries that make up the bulk of the download. For
>> example, if you unzip the jar file and inspect the contents, you can see
>> that just one of these parsers, poi, consists of 24 MB:
>> 
>> % cd org/apache/poi
>> 
>> % du -sh .
>> 24M .
>> 
>> - Keith
>> 
>>> On Sat, Sep 26, 2020 at 7:54 AM Laurence Vanhelsuwe 
>>> <[email protected]>
>>> wrote:
>>> 
>>> I found Tika during a quest to extract PDF metadata in Java. Did i screw
>>> up the JAR download, or is Tika really 70MB ?
>>> 
>>> Kind regards,
>>> 
>>> Laurence
>>> 
>>> 
> 

Reply via email to