[jira] [Comment Edited] (PDFBOX-2261) Extremely long hang during getFields() on a few PDF files

John Hewson (JIRA) Wed, 13 Aug 2014 11:20:44 -0700

    [ 
https://issues.apache.org/jira/browse/PDFBOX-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14095841#comment-14095841
 ]


John Hewson edited comment on PDFBOX-2261 at 8/13/14 6:18 PM:
--------------------------------------------------------------

I encountered this issue in PDFBOX-2164 but only added a workaround for a 
specific NPE. The recursive approach used by PDField was indeed incorrect, the 
PDF spec explains why:

{quote}
For purposes of definition and naming, the fields can be organized 
hierarchically and can inherit attributes from their ancestors in the field 
hierarchy
{quote}

It seems that the problem with PDFBox's current design is that each node in the 
field tree is represented by a PDField, however not every node in the field 
tree is really a field, some nodes are just there to organise the tree 
structure. One solution would be to have PDAcroForm read the field tree and 
have it produce a Map<String, PDField> of named fields, with all of the 
inheritance taken into account. Another solution would be to have fields be 
aware of their parent in the field tree and look-up appropriate values (this 
would preserve the field tree structure between writes), but the parent node 
should not be a PDField (!!!) it should be PDNonTerminalField* or some similar 
new class, the PDF spec is clear on this:

{quote}
A non-terminal field does not logically have a type of its own; it is merely  a 
container for inheritable attributes that are intended for descendant  terminal 
fields of any type. 
{quote}

\* Any new PDNonTerminalField class should probably not inherit from PDField, 
either.


was (Author: jahewson):
I encountered this issue in PDFBOX-2164 but only added a workaround for a 
specific NPE. The recursive approach used by PDField was indeed incorrect, the 
PDF spec explains why:

{quote}
For purposes of definition and naming, the fields can be organized 
hierarchically and can inherit attributes from their ancestors in the field 
hierarchy
{quote}

It seems that the problem with PDFBox's current design is that each node in the 
field tree is represented by a PDField, however not every node in the field 
tree is really a field, some nodes are just there to organise the tree 
structure. One solution would be to have PDAcroForm read the field tree and 
have it produce a Map<String, PDField> of named fields, with all of the 
inheritance taken into account. Another solution would be to have fields be 
aware of their parent in the field tree and look-up appropriate values (this 
would preserve the field tree structure between writes), but the parent node 
should not be a PDField (!!!) it should be PDNonTerminalField* or some similar 
new class, the PDF spec is clear on this:

{quote}
A non-terminal field does not logically have a type of its own; it is merely  a 
container for inheritable attributes that are intended for descendant  terminal 
fields of any type. 
{quote}

\* Any new PDNonTerminalField class should not inherit from PDField, either.

> Extremely long hang during getFields() on a few PDF files
> ---------------------------------------------------------
>
>                 Key: PDFBOX-2261
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2261
>             Project: PDFBox
>          Issue Type: Bug
>          Components: AcroForm
>    Affects Versions: 1.8.6
>            Reporter: Tim Allison
>            Assignee: Andreas Lehmkühler
>            Priority: Minor
>             Fix For: 2.0.0
>
>         Attachments: 966679.pdf, RadioButtons.pdf, screenshot-pdfdebugger.png
>
>
> When I run oap.examples.fdf.PrintFields from trunk, the code seems to hang 
> during acroForm.getFields().  This is a heavy load hang.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (PDFBOX-2261) Extremely long hang during getFields() on a few PDF files

Reply via email to