I've managed to get the PDFParser running in the native mode, but I had to
delay the initialization of
org.apache.pdfbox.pdmodel.font.PDType1Font, this class has static
PDType1Font instances, one of them leading to
org.apache.fontbox.ttf.RAFDataStream which opens a file handler thus Graal
can not convert it to the native code during the build time, so one needs
to delay the initialization of PDType1Font till the run time.
If we start from the PDF parser the the call path to RAFDataStream starts
from:
org.apache.pdfbox.pdmodel.interactive.form.PDAcroForm.verifyOrCreateDefaults(PDAcroForm.java:106)
at
org.apache.pdfbox.pdmodel.interactive.form.PDAcroForm.<init>(PDAcroForm.java:93)
at
org.apache.pdfbox.pdmodel.PDDocumentCatalog.getAcroForm(PDDocumentCatalog.java:108)
org.apache.tika.parser.pdf.PDFParser.handleXFAOnly(PDFParser.java:534)
I guess I may need to create a PR for PDFBox where RAFDataStream opens a
stream lazily, with a check like ensureOpen() being added to its read
methods...
Sergey
On Fri, May 3, 2019 at 1:22 PM Sergey Beryozkin <[email protected]>
wrote:
> Yes, please add 'sergeyb', I've just assigned myself a CXF issue as
> 'sergeyb'. Sorry about these multiple ids, but indeed I'll try to keep
> using a single one.
>
> Thanks, Sergey
>
>
>
> On Fri, May 3, 2019 at 12:13 PM Tim Allison <[email protected]> wrote:
>
>> I can add 'sergeyb' if you'd prefer!
>>
>> On Fri, May 3, 2019 at 5:43 AM Sergey Beryozkin <[email protected]>
>> wrote:
>> >
>> > Though I might need to settle on the 'sergeyb' eventually since it is my
>> > apache committer id.
>> > Thanks...
>> >
>> > On Fri, May 3, 2019 at 10:29 AM Sergey Beryozkin <[email protected]>
>> > wrote:
>> >
>> > > Oh, I forgot I had a 'sergey_beryozkin' id as well, this is not good,
>> > > shows how long ago I did contribute :-) (did try sergey.beryozkin
>> though).
>> > >
>> > > Thanks for checking it, I've just assigned this issue to myself.
>> > > Cheers, Sergey
>> > >
>> > >
>> > > On Thu, May 2, 2019 at 6:08 PM Sergey Beryozkin <[email protected]
>> >
>> > > wrote:
>> > >
>> > >> Hi Tim
>> > >>
>> > >> I can't assign
>> > >> https://issues.apache.org/jira/browse/TIKA-2862
>> > >>
>> > >> to myself, I used to be able to assign, I know I had some time away
>> from
>> > >> Tika, but I'm keen to return with few contributions :-)
>> > >> Please update my record for me to be able to assign the issues again
>> > >>
>> > >> Cheers, Sergey
>> > >>
>> > >> On Tue, Apr 30, 2019 at 6:22 PM Sergey Beryozkin <
>> [email protected]>
>> > >> wrote:
>> > >>
>> > >>> Hi Tim, All
>> > >>>
>> > >>> I've started working on integrating Tika with Quarkus [1]. The main
>> idea
>> > >>> is to be able to use Tika in the native image mode.
>> > >>> It's quite likely I'll start creating the PRs soon, to get the
>> native
>> > >>> image related issues resolved, these are related to some libraries
>> > >>> statically initializing FileDescriptors, etc.
>> > >>>
>> > >>> Thanks, Sergey
>> > >>>
>> > >>> [1]
>> > >>>
>> https://github.com/sberyozkin/quarkus/tree/tika_extension/extensions/tika
>> > >>> [2]
>> > >>>
>> https://github.com/sberyozkin/quarkus-quickstarts/tree/tika/getting-started-tika
>> > >>>
>> > >>>
>>
>