[jira] [Commented] (TIKA-1857) Enhance PDFParser to extract text from XFA forms

Maruan Sahyoun (JIRA) Fri, 19 Feb 2016 06:19:55 -0800

    [ 
https://issues.apache.org/jira/browse/TIKA-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15154260#comment-15154260
 ]


Maruan Sahyoun commented on TIKA-1857:
--------------------------------------

{quote}
Do I understand correctly then: no matter whether static or dynamic, try to 
pull data from XFA; if that doesn't exist, fall back to the AcroForm?
{quote}

if you'd like to replicate Adobe Reader/Acrobat behavior - yes. BTW don't know 
what will happen with PDF 2.0 as there XFA is deprecated which might have an 
implication for future versions.

{quote}
Also, is there an obvious way to determine static vs. dynamic aside from 
checking to see if there are fields in the AcroForm?
{quote}

there is {{PDAcroForm.xfaIsDynamic()}} which will give you the information 
(which checks if there is XFA and no AcroForm fields) 

> Enhance PDFParser to extract text from XFA forms
> ------------------------------------------------
>
>                 Key: TIKA-1857
>                 URL: https://issues.apache.org/jira/browse/TIKA-1857
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>            Reporter: Pascal Essiembre
>            Priority: Trivial
>              Labels: patch
>             Fix For: 1.13
>
>         Attachments: 041617_filled_out.pdf, xfa_in_govdocs1.txt
>
>
> Extract text from PDF Forms (XFA).  Information about XFA: 
> https://en.wikipedia.org/wiki/XFA



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TIKA-1857) Enhance PDFParser to extract text from XFA forms

Reply via email to