[Libreoffice-bugs] [Bug 95328] Hybrid PDF change and use PDF 1.7 attachment to PDF, rather than current append to PDF 1.4

bugzilla-daemon Tue, 03 Nov 2015 13:00:49 -0800

https://bugs.documentfoundation.org/show_bug.cgi?id=95328


--- Comment #8 from [email protected] ---
(In reply to V Stuart Foote from comment #7)

> It has now been given an appropriate extended scope & summary, the what and
> why of implementing ISO 32000-1 attachment 

As I explained already: it is *not* the (scary sounding) PDF-1.7/ISO3200-1
compliant file attachment that should be implemented. It is the PDF-1.3 one....

[....]

> > Oh! And how do you currently open a Hybrid PDF with LO?!??
> > 
> > (You look at the key special entry in the trailer section. You can still
> > keep that, or a slightly modified key entry for LO's benefit to make it
> > easier to recognize its own source document format as being embedded. But
> > add the other modifications for the benefit of other PDF processing
> > applications to recognize the embedded file.)
> 
> Not sure... Reading the code, for import it looks like we parse everything
> beyond the PDF ending--and extract our content streams of interest matched
> only against MIME type.

I can't read the code... but I can read+analyze the PDF's source code.

For *importing* a PDF, there are two possibilities:

1. You have an OO-/LO-generated "hybrid" PDF: discover+extract the ODT stream
2. You have a no-hybrid PDF: open it with LO-Draw.

For case (1), the most efficient procedure would be:

1. Read the PDF trailer. It is at the end of the file. Each and every PDF 
   reader has to do that and has to start reading there. The trailer contains
   an entry pointing to the byte offset to the start of the xref table.

2. For OO-/LO-generated PDFs, the trailer also contains the "proprietary"
   key:

     /AdditionalStreams [/application#2Fvnd#2Eoasis#2Eopendocument#2Etext 6 0
R]

   This key names the PDF object number (here: object number 6) which has a
   stream that contains the ODT document.

3. Jump to the xref table and read it. The xref table contains a list of 
   all used PDF objects and their respective file offsets.

4. Jump to the byte offset named for the object no. 6 and extract the stream
   content. The stream content is a 1:1 copy of the original ODT file.

5. Only if you miss to evaluate above step (2), you would have to "parse
   everything beyond the PDF ending and extract our content streams of 
   interest".


> When exporting, we end the PDF stream, and then
> append a stream holding the source ODF archive. So we are not now inside the
> PDF structure at all.

Not correct. The factual PDFs generated by LO as "hybrid" do write the ODT
archive right *into* the PDF structure. In my above example, it was object
no. 6 (out of a total of 17 objects in the PDF file), and that object was at
byte offset 479 (out of a total file size of 5.023.779 Bytes).

-- 
You are receiving this mail because:
You are the assignee for the bug.

_______________________________________________
Libreoffice-bugs mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs

[Libreoffice-bugs] [Bug 95328] Hybrid PDF change and use PDF 1.7 attachment to PDF, rather than current append to PDF 1.4

Reply via email to