[jira] [Commented] (PDFBOX-5519) Can PDFbox create the ability to extract tab numbers from pdf fields?

Tilman Hausherr (Jira) Fri, 23 Sep 2022 23:03:06 -0700


    [ 
https://issues.apache.org/jira/browse/PDFBOX-5519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17608949#comment-17608949
 ]


Tilman Hausherr commented on PDFBOX-5519:
-----------------------------------------

{quote}
Is checking the order of the page annotations the best way to go about this?
{quote}
To be honest, I don't know. But for your file it seems to be.
{quote}
how are you able to view all of the annotations in the tree structure
{quote}
This is from PDFDebugger. You could get the named order by altering this code:
{code}
        PDAcroForm acroForm = document.getDocumentCatalog().getAcroForm(null);
        if (acroForm == null)
        {
            return;
        }
        Set<COSDictionary> dictionarySet = new HashSet<>();
        for (PDAnnotation annotation : page.getAnnotations())
        {
            dictionarySet.add(annotation.getCOSObject());
        }
        for (PDField field : acroForm.getFieldTree())
        {
            for (PDAnnotationWidget widget : field.getWidgets())
            {
                // check if the annotation widget is on this page
                // (checking widget.getPage() also works, but it is sometimes 
null)
                if (dictionarySet.contains(widget.getCOSObject()))
                {
                    rectMap.put(widget.getRectangle(), "Field name: " + 
field.getFullyQualifiedName());
                }
            }
        }
{code}
The change would be to go through the annotations and then see which field has 
that annotation widget.

> Can PDFbox create the ability to extract tab numbers from pdf fields? 
> ----------------------------------------------------------------------
>
>                 Key: PDFBOX-5519
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5519
>             Project: PDFBox
>          Issue Type: Wish
>    Affects Versions: 2.0.26
>            Reporter: Tony C
>            Priority: Major
>         Attachments: DummyPdf-1.pdf, Screen Shot 2022-09-23 at 5.19.50 
> PM-1.png, image-2022-09-24-05-36-27-659.png
>
>
> I am in the process of converting a pdf into html. The pdf i am using has tab 
> numbers set on its fields.
> This is where i run across an issue. I am trying to extract the tab number 
> from the pdf fields but i dont think the library offers that. I would need 
> that value in order to set the tabindex when I create the corresponding html 
> elements.
> The pdf i am using is[^DummyPdf.pdf]
> ^!Screen Shot 2022-09-23 at 5.19.50 PM.png!^



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (PDFBOX-5519) Can PDFbox create the ability to extract tab numbers from pdf fields?

Reply via email to