[jira] [Commented] (TIKA-4010) Add boolean metadata element for isLinearized for PDFs

Tim Allison (Jira) Mon, 10 Apr 2023 11:27:06 -0700


    [ 
https://issues.apache.org/jira/browse/TIKA-4010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17710261#comment-17710261
 ]


Tim Allison commented on TIKA-4010:
-----------------------------------

We don't have far to look for examples:
{noformat}
testPDF_Version.4.x.pdf
testPDF_Version.11.x.PDFA-1b.pdf
testPDF_Version.6.x.pdf
testPDF_protected.pdf
testFlashInPDF.pdf
testPDF_Version.7.x.pdf
testPDF_acroform3.pdf
testPDF_Version.5.x.pdf
testPDF_twoAuthors.pdf
testPDF_XFA_govdocs1_258578.pdf
testPDFFileEmbInAnnotation.pdf
testPDF_Version.9.x.pdf
testPDFPackage.pdf
testPDF_rotated.pdf
testPopupAnnotation.pdf
testPDF_Version.10.x.pdf
testPDF_Version.8.x.pdf
{noformat}

> Add boolean metadata element for isLinearized for PDFs
> ------------------------------------------------------
>
>                 Key: TIKA-4010
>                 URL: https://issues.apache.org/jira/browse/TIKA-4010
>             Project: Tika
>          Issue Type: New Feature
>            Reporter: Tim Allison
>            Priority: Trivial
>
> Other tools such as pdfinfo extract information about whether or not the PDF 
> is linearized.  We should do that as well.
> In PDFBox 3.x, we can simply call {{.getLinearizedDictionary()}} on the 
> COSDocument.  In 2.x, I tried to port that underlying code with no success -- 
> the Linearized dictionary was not being parsed as a dictionary.
> I don't think this has a high priority.  I'm happy enough waiting for 3.x.  
> However, if there's a straightforward way to do this with 2.x, let's do that.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (TIKA-4010) Add boolean metadata element for isLinearized for PDFs

Reply via email to