PDFBox did the trick for me. Thanks for the golden tip. :-)

> On 26 Sep 2020, at 19:06, Dave Fisher <[email protected]> wrote:
> 
> IIRC - if you know you only want PDF extraction then take a look at Apache 
> PDFBox. PDFBox.apache.org
> 
> Sent from my iPhone
> 
>> On Sep 26, 2020, at 10:00 AM, Laurence Vanhelsuwe <[email protected]> 
>> wrote:
>> 
>> Thanks for the explanation.
>> 
>> I understand the approach.. but in my particular use case, I cannot 
>> reasonably justify inflating my application size from 7Mb to 77Mb just to 
>> add functionality amounting to less than 1% of all functionality.
>> 
>> I guess there’s no way to surgically extract just the PDF metadata parsing 
>> functionality from Tika ?
>> 
>> Laurence
>> 
>>> On 26 Sep 2020, at 18:04, Keith Bennett <[email protected]> wrote:
>>> 
>>> Tika coordinates the use of many external-to-Tika parser libraries, each
>>> with their own dependencies, for parsing many types of files. These parser
>>> libraries are bundled into the tika-app jar file for your convenience. I
>>> believe it's these libraries that make up the bulk of the download. For
>>> example, if you unzip the jar file and inspect the contents, you can see
>>> that just one of these parsers, poi, consists of 24 MB:
>>> 
>>> % cd org/apache/poi
>>> 
>>> % du -sh .
>>> 24M .
>>> 
>>> - Keith
>>> 
>>>> On Sat, Sep 26, 2020 at 7:54 AM Laurence Vanhelsuwe 
>>>> <[email protected]>
>>>> wrote:
>>>> 
>>>> I found Tika during a quest to extract PDF metadata in Java. Did i screw
>>>> up the JAR download, or is Tika really 70MB ?
>>>> 
>>>> Kind regards,
>>>> 
>>>> Laurence
>>>> 
>>>> 
>> 
> 

Reply via email to