[ 
https://issues.apache.org/jira/browse/TIKA-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15149405#comment-15149405
 ] 

Maruan Sahyoun commented on TIKA-1857:
--------------------------------------

The reason you are not getting the data is that this is stored as part of the 
data node in an xml data structure which matches the binding information in the 
field. That data is in {{xfa.datasets.data}} with the {{my_exibitor}} value 
stored in the {{Exhibitorname}} field.

Extracting {{speak|text|exData}} will give you the boilerplate text but not the 
field value.

Now there are two types of XFA forms - static and dynamic. Static XFA forms 
will have an XFA entry and AcroForm fields. Dynamic XFA forms will only have an 
XFA entry and no AcroForm fields.

When an XFA form is filled out with an XFA aware PDF processor for static forms 
both the {{xfa.datasets.data}} information is updated as well as the {{V}} 
entry of the AcroForm form field. If you fill out a static form with a non XFA 
aware PDF processor it will only see the AcroForm information and as a result 
only updates the AcroForm form fields {{V}} entry.

When trying to fill a dynamic XFA form with a non XFA aware PDF processor it 
will not see any form fields at all.

I'm happy to provide more information on that topic but thought that this will 
give you a first outline.

> Enhance PDFParser to extract text from XFA forms
> ------------------------------------------------
>
>                 Key: TIKA-1857
>                 URL: https://issues.apache.org/jira/browse/TIKA-1857
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>            Reporter: Pascal Essiembre
>            Priority: Trivial
>              Labels: patch
>             Fix For: 1.13
>
>         Attachments: 041617_filled_out.pdf, xfa_in_govdocs1.txt
>
>
> Extract text from PDF Forms (XFA).  Information about XFA: 
> https://en.wikipedia.org/wiki/XFA



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to