https://bugs.documentfoundation.org/show_bug.cgi?id=66580

Dave Gilbert <[email protected]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[email protected]
                   |                            |om

--- Comment #30 from Dave Gilbert <[email protected]> ---
(Copying Caolán in due to the hybrid performance hack)
Hi Tomaz,
  Thanks for replying.
For context, one reason I'm asking abotu this stuff is that I just fixed the
PDF import filter to handle PDF2 encryption, but not with hybrid yet, and I was
looking at how to fix that, and know a fairly easy way to do it - but
understanding the details of hybrid seems to make sense first!

>>   a) Without a trailer, I don't understand how a hybrid loader is supposed
>> to understand that a PDF embedded file is actually a hybrid PDF rather than
>> just a PDF that happens to have an attached; A PDF could have loads of
>> embedded files and none of them actually represent the same contents as the
>> PDF (e.g. maybe include raw data spreadsheets or something with an
>> explanatory PDF).

> Using a filename convention "Original.od*" and making sure it's a compatible 
> document. 
> Or in PDF 2.0 - the embedded file that is  /AFRelationship is /Source.   

OK, I don't know the details of PDF 2.0 yet, so I'll take your word for that;
but is that something the writer already does?

>>   b) caolanm's performance hack  - 'detectHasAdditionalStreams' from ~2023 -
>> sniffs the trailer at the end of the file to make a quick detection about
>> whether it might be a hybrid.  So if the use of the trailer disappears you
>> lose that performance trick.

>Well, that's unfortunate but reading the xref entry and checking the objects 
>for an 
> embedded file is also fast and can spare you from loading the whole PDF.

Before I break it, I'd like to give Caolán (copied in) a chance to say stuff.
My feeling is that Caolán hack is unreasonably fast - it was driven by this
ticket:
https://github.com/CollaboraOnline/online/issues/7307

> Also BTW, we don't write those to the trailer anymore, if we export to PDF/A 
> variant or > PDF/UA is enabled.

Hmm so we don't - we really should have a test for each of the hybrid types we
support to make sure we can read it!  We've already got a couple in there, but
not any of the new types (which we can't read!)

Also:
  - Do we have a 'spec' of hybrid anywhere?
  - If we change it what's the rules on compatibility for old docs?
  - My guess for DocumentChecksum had been that it was intended to stop the pdf
and the original content getting out of sync - but that would want a checksum
of the embedded document, not the whole file wouldn't it?
  - If we needed to check the embedded file name and/or the  AFRelationship you
mentioned, is that doable on a file prior to decryption?
  - My relatively easy way of fixing the PDF2 encrypted hybrids is to do the
same thing I did for PDF2 encrypted none-hybrid, and use Poppler to do the
hybrid extraction; then we can lose our extra PDF parser code.

-- 
You are receiving this mail because:
You are the assignee for the bug.

Reply via email to