[ https://issues.apache.org/jira/browse/PDFBOX-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14095841#comment-14095841 ]
John Hewson edited comment on PDFBOX-2261 at 8/13/14 6:18 PM: -------------------------------------------------------------- I encountered this issue in PDFBOX-2164 but only added a workaround for a specific NPE. The recursive approach used by PDField was indeed incorrect, the PDF spec explains why: {quote} For purposes of definition and naming, the fields can be organized hierarchically and can inherit attributes from their ancestors in the field hierarchy {quote} It seems that the problem with PDFBox's current design is that each node in the field tree is represented by a PDField, however not every node in the field tree is really a field, some nodes are just there to organise the tree structure. One solution would be to have PDAcroForm read the field tree and have it produce a Map<String, PDField> of named fields, with all of the inheritance taken into account. Another solution would be to have fields be aware of their parent in the field tree and look-up appropriate values (this would preserve the field tree structure between writes), but the parent node should not be a PDField (!!!) it should be PDNonTerminalField* or some similar new class, the PDF spec is clear on this: {quote} A non-terminal field does not logically have a type of its own; it is merely a container for inheritable attributes that are intended for descendant terminal fields of any type. {quote} \* Any new PDNonTerminalField class should probably not inherit from PDField, either. was (Author: jahewson): I encountered this issue in PDFBOX-2164 but only added a workaround for a specific NPE. The recursive approach used by PDField was indeed incorrect, the PDF spec explains why: {quote} For purposes of definition and naming, the fields can be organized hierarchically and can inherit attributes from their ancestors in the field hierarchy {quote} It seems that the problem with PDFBox's current design is that each node in the field tree is represented by a PDField, however not every node in the field tree is really a field, some nodes are just there to organise the tree structure. One solution would be to have PDAcroForm read the field tree and have it produce a Map<String, PDField> of named fields, with all of the inheritance taken into account. Another solution would be to have fields be aware of their parent in the field tree and look-up appropriate values (this would preserve the field tree structure between writes), but the parent node should not be a PDField (!!!) it should be PDNonTerminalField* or some similar new class, the PDF spec is clear on this: {quote} A non-terminal field does not logically have a type of its own; it is merely a container for inheritable attributes that are intended for descendant terminal fields of any type. {quote} \* Any new PDNonTerminalField class should not inherit from PDField, either. > Extremely long hang during getFields() on a few PDF files > --------------------------------------------------------- > > Key: PDFBOX-2261 > URL: https://issues.apache.org/jira/browse/PDFBOX-2261 > Project: PDFBox > Issue Type: Bug > Components: AcroForm > Affects Versions: 1.8.6 > Reporter: Tim Allison > Assignee: Andreas Lehmkühler > Priority: Minor > Fix For: 2.0.0 > > Attachments: 966679.pdf, RadioButtons.pdf, screenshot-pdfdebugger.png > > > When I run oap.examples.fdf.PrintFields from trunk, the code seems to hang > during acroForm.getFields(). This is a heavy load hang. -- This message was sent by Atlassian JIRA (v6.2#6252)