[
https://issues.apache.org/jira/browse/PDFBOX-384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andreas Lehmkühler closed PDFBOX-384.
-------------------------------------
Resolution: Won't Fix
Assignee: Andreas Lehmkühler
Closed as I guess some of the ideas are already implemented.
> sometimes, when PDFBox writes stream's content in a PDF file, it can no
> longer read it
> --------------------------------------------------------------------------------------
>
> Key: PDFBOX-384
> URL: https://issues.apache.org/jira/browse/PDFBOX-384
> Project: PDFBox
> Issue Type: Bug
> Components: Writing
> Affects Versions: 0.7.3
> Environment: pdfbox 0.73, java 5, windows os
> Reporter: Son
> Assignee: Andreas Lehmkühler
> Attachments: COSStream.java, COSWriter.java
>
>
> the stream content writing of PDFBox creates a Length entry in the stream's
> directory that is an indirect reference.
> the specification states (extracted from pdf reference 1.5, but also valid
> for all reference guide since), section 3.2.7 Stream Objects:
> ...
> stream consists of a dictionary that describes a sequence of bytes, followed
> by
> zero or more bytes bracketed between the keywords stream and endstream:
> dictionary
> stream
> ...Zero or more bytes...
> endstream
> All streams must be indirect objects (see Section 3.2.9, "Indirect Objects")
> and
> the stream dictionary must be a direct object. The keyword stream that
> follows
> the stream dictionary should be followed by an end-of-line marker...
> the stream dictionary must be direct. what is not state is that entries in
> the dictionary should be direct as well as .... later on, it says in the
> Stream Extent paragraph:
> ...
> Every stream dictionary has a Length entry that indicates how many bytes of
> the
> PDF file are used for the stream's data. (If the stream has a filter, Length
> is the
> number of bytes of encoded data.) In addition, most filters are defined so
> that the
> data is self-limiting; that is, they use an encoding scheme in which an
> explicit
> end-of-data (EOD) marker delimits the extent of the data. Finally, streams
> are
> used to represent many objects from whose attributes a length can be
> inferred. All
> of these constraints must be consistent.
> ...
> It indicates that most filters handles self-delimiting data ... thereby not
> requiring all filtering algorithm to support so.
> So, in order to explicitly set the Length value inside the stream dictionary,
> the filtering of content should be made prior to writing the dictionary.
> The current PDFBox behavior does the following:
> (see org.pdfbox.pdfwriter.COSWriter.visitFromStream(COSStream obj) at line
> 929:
> ...
> InputStream input = obj.getFilteredStream();
> // set the length of the stream and write stream dictionary
> COSObject lengthObject = new COSObject( null );
>
> obj.setItem(COSName.LENGTH, lengthObject);
> // write the stream content
> visitFromDictionary( obj );
> getStandardOutput().write(STREAM);
> ...
> // writes the content
> ...
> lengthObject.setObject( new COSInteger( totalAmountWritten ) );
> getStandardOutput().writeCRLF();
> getStandardOutput().write(ENDSTREAM);
> getStandardOutput().writeEOL();
> return null;
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)