[
https://issues.apache.org/jira/browse/PDFBOX-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14637502#comment-14637502
]
Tilman Hausherr commented on PDFBOX-2894:
-----------------------------------------
The last commit prevents exceptions like this:
{code}
java.lang.ClassCastException: org.apache.pdfbox.cos.COSDictionary cannot be
cast to org.apache.pdfbox.cos.COSStream
at
org.apache.pdfbox.preflight.process.MetadataValidationProcess.getXpacket(MetadataValidationProcess.java:278)
at
org.apache.pdfbox.preflight.process.MetadataValidationProcess.validate(MetadataValidationProcess.java:69)
at
org.apache.pdfbox.preflight.utils.ContextHelper.callValidation(ContextHelper.java:84)
at
org.apache.pdfbox.preflight.utils.ContextHelper.validateElement(ContextHelper.java:122)
at
org.apache.pdfbox.preflight.PreflightDocument.validate(PreflightDocument.java:163)
at
com.mycompany.preflightmasstest.PreflightChecker.run(PreflightChecker.java:52)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
{code}
with content like this (found in files 272372 and 333472)
{code}
539 0 obj << /Type /Metadata /Subtype /XML >> endobj
{code}
> Remove COSStreamArray / SequenceRandomAccessRead
> ------------------------------------------------
>
> Key: PDFBOX-2894
> URL: https://issues.apache.org/jira/browse/PDFBOX-2894
> Project: PDFBox
> Issue Type: Improvement
> Affects Versions: 2.0.0
> Reporter: John Hewson
> Assignee: John Hewson
> Fix For: 2.0.0
>
> Attachments: 166292-fi-ligature.pdf, 166292-fi-ligature_unc.pdf
>
>
> This ties in with my COSStream simplification in PDFBOX-2893.
> COSStreamArray is a troublesome abstraction, it's not a real COS object and
> it's the only COS object which can be generated _after_ parsing. Look at the
> implementation of COSStreamArray, most methods throw an exception because
> it's _not_ a COSStream - it violates the contact of the very thing it claims
> to be. Even PDPageContentStream has to use instanceof to "peer through" the
> abstraction of COSStreamArray.
> There's no reason to have this class, other than to duck-tape flaws in 1.8's
> APIs, namely that PDPage#getStream() returns a PDStream and PDFStreamParser
> expects a PDStream, yet both of these may be arrays of streams.
> We can fix this in 2.0 by getting rid of the erroneous PDPage#getStream() and
> by exposing the array of streams, rather than attempting to hide them.
> Hopefully this will also fix existing errors which may be lurking throughout
> the codebase (see first comment, below) which are associated with mistaking
> COSStreamArray for a COSStream. We can still provide an InputStream API which
> abstracts over the array of streams, because there's nothing wrong with that
> - so users can have the same simple and convenient experience.
> An added benefit of doing this is that it will allow us to remove
> SequenceRandomAccessRead, a highly complex memory-holding class.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]