[
https://issues.apache.org/jira/browse/TIKA-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15154260#comment-15154260
]
Maruan Sahyoun commented on TIKA-1857:
--------------------------------------
{quote}
Do I understand correctly then: no matter whether static or dynamic, try to
pull data from XFA; if that doesn't exist, fall back to the AcroForm?
{quote}
if you'd like to replicate Adobe Reader/Acrobat behavior - yes. BTW don't know
what will happen with PDF 2.0 as there XFA is deprecated which might have an
implication for future versions.
{quote}
Also, is there an obvious way to determine static vs. dynamic aside from
checking to see if there are fields in the AcroForm?
{quote}
there is {{PDAcroForm.xfaIsDynamic()}} which will give you the information
(which checks if there is XFA and no AcroForm fields)
> Enhance PDFParser to extract text from XFA forms
> ------------------------------------------------
>
> Key: TIKA-1857
> URL: https://issues.apache.org/jira/browse/TIKA-1857
> Project: Tika
> Issue Type: Improvement
> Components: parser
> Reporter: Pascal Essiembre
> Priority: Trivial
> Labels: patch
> Fix For: 1.13
>
> Attachments: 041617_filled_out.pdf, xfa_in_govdocs1.txt
>
>
> Extract text from PDF Forms (XFA). Information about XFA:
> https://en.wikipedia.org/wiki/XFA
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)