https://bugs.documentfoundation.org/show_bug.cgi?id=156478
Bug ID: 156478
Summary: PDF export: Drop bloat of "default" /MediaBox in root
node of the page tree
Product: LibreOffice
Version: Inherited From OOo
Hardware: All
OS: All
Status: UNCONFIRMED
Keywords: difficultyBeginner, easyHack, filter:pdf, skillCpp
Severity: normal
Priority: medium
Component: Printing and PDF export
Assignee: [email protected]
Reporter: [email protected]
Export any document from LibreOffice, and inspect the PDF in a plain text
editor.
There are separate page elements for each exported page there, starting with
'<</Type/Page/Parent', each having own /MediaBox. Also, there is a root node of
the page tree, starting with '<</Type/Pages', listing all the document pages
under its Kids, and having an own /MediaBox.
The latter /MediaBox of the root is redundant and should not be emitted.
The code pointer: PDFWriterImpl::emitCatalog in
vcl/source/gdi/pdfwriter_impl.cxx.
Rationale:
The PDF 1.7 standard [PDF 32000-1:2008] has this under 7.7.3 "Page Tree",
7.7.3.4 "Inheritance of Page Attributes":
> Some page attributes ... are designed as *inheritable*. ...
> EXAMPLE A document may specify the same media box for all of its pages by
> including a MediaBox entry in the root node of the page tree. If
> necessary, an individual page object may override this inherited
> value with a MediaBox entry of its own.
Note that the MediaBox entry in the root is optional. Note also, that
LibreOffice always outputs the individual pages' MediaBox entries, so the
"default" value is never used.
Initially, the output of this entry was introduced in commit
df0f52d3aadea5c4d5f600d1533901af1087b464 (#100608# preparations for PDF export,
2002-07-08); there, it used hardcoded values of A4 page (i.e., it didn't
consider actual document page sizes at all).
Later, in commit 98468607f7a8d0b1f5f7e3ecd09f756aea904d00 (INTEGRATION: CWS
vcl87 (1.122.16); FILE MERGED, 2008-04-03, related to i#75941), document pages
started to be taken into account - to workaround a third-party bug in
ImageMagick. The procedure to calculate the MediaBox values was strange,
allowing to generate maximal width independently of maximal height, and thus,
possibly to get the "default" size not matching any single actual page size in
the document.
In commit 4830592b780833cf5eee2aef30bc9c5d444dfb24 (PDF export: fix handling of
page sizes larger than 508 cm, 2020-04-16), the values started to take UserUnit
into account - but again, in a wrong way, because maximal vertical size set its
corresponding UserUnit independently of maximal horizontal size; and it was
possible to have a super-wide page (e.g., 5500 mm wide, 210 mm high), followed
by a relatively high but narrow page (e.g., 210 x 297 mm), and the final
UserUnit applied to both of the dimensions was taken from the last maximal
size, which happened to be the height of the last page, and which defined
UserUnit of 1 (instead of 2 required for 5500 mm, which is larger than 14400
pt).
All in all, this is just a bloat, creating quite some unneeded maintenance
effort (in addition to the mentioned commits, see e.g. commit
48aaca5ba9c27a247ed502c3db827c6ac9f34df9 - tdf#148033: Loss of precision in
/MediaBox (PDFWriterImpl::emitCatalog()), 2022-03-18); and all the same, not
done properly. It is not (or must not be) used by any software in the presence
of the individual pages' MediaBox entries. It must be dropped - except in case
of "sanity check failure", when there's no pages in the document (it can't
happen, but the check is there, so be safe and put a hardcoded A4 MediaBox in
that case).
--
You are receiving this mail because:
You are the assignee for the bug.