PDFBox did the trick for me. Thanks for the golden tip. :-)
> On 26 Sep 2020, at 19:06, Dave Fisher <[email protected]> wrote:
>
> IIRC - if you know you only want PDF extraction then take a look at Apache
> PDFBox. PDFBox.apache.org
>
> Sent from my iPhone
>
>> On Sep 26, 2020, at 10:00 AM, Laurence Vanhelsuwe <[email protected]>
>> wrote:
>>
>> Thanks for the explanation.
>>
>> I understand the approach.. but in my particular use case, I cannot
>> reasonably justify inflating my application size from 7Mb to 77Mb just to
>> add functionality amounting to less than 1% of all functionality.
>>
>> I guess there’s no way to surgically extract just the PDF metadata parsing
>> functionality from Tika ?
>>
>> Laurence
>>
>>> On 26 Sep 2020, at 18:04, Keith Bennett <[email protected]> wrote:
>>>
>>> Tika coordinates the use of many external-to-Tika parser libraries, each
>>> with their own dependencies, for parsing many types of files. These parser
>>> libraries are bundled into the tika-app jar file for your convenience. I
>>> believe it's these libraries that make up the bulk of the download. For
>>> example, if you unzip the jar file and inspect the contents, you can see
>>> that just one of these parsers, poi, consists of 24 MB:
>>>
>>> % cd org/apache/poi
>>>
>>> % du -sh .
>>> 24M .
>>>
>>> - Keith
>>>
>>>> On Sat, Sep 26, 2020 at 7:54 AM Laurence Vanhelsuwe
>>>> <[email protected]>
>>>> wrote:
>>>>
>>>> I found Tika during a quest to extract PDF metadata in Java. Did i screw
>>>> up the JAR download, or is Tika really 70MB ?
>>>>
>>>> Kind regards,
>>>>
>>>> Laurence
>>>>
>>>>
>>
>