Hi Damjan,

That sounds great!

If nobody is faster than me I will cherry-pick your commit to AOO42X tomorrow.

Thank you for your massive work.

Regards,

   Matthias

Am 16.03.24 um 04:49 schrieb Damjan Jovanovic:
Hi

Bug 118236 with 7 votes, the inability to open password-protected
(encrypted) OOXML files from MS Office 2010+, is now fixed in trunk :-)

---snip---
commit 506fa58b1970084a0caacb50b3a805e469be4756 (HEAD -> trunk,
origin/trunk, origin/HEAD)
Author: Damjan Jovanovic <dam...@apache.org>
Date:   Sat Mar 2 18:47:05 2024 +0200

     Implement the (MS Office 2010+) OOXML "Agile encryption" support, so
that we
     can open such password-protected OOXML files.

     Adds all the Agile encryption XML tokens and namespaces, and parses the
XML
     from EncryptionInfo stream, gets OpenOffice to recognize the file is
encrypted
     and ask for a password, and successfully decrypts the file if password
is
     correct.

     Also a number of other fixes and improvements:
     - Sorted main/oox/source/token/tokens.txt so it's in alphabetical order
       (wrong order might have broken certain tokens?).
     - Refactored how OOXML encryption is generally handled. It's now in its
       own file.
     - Added logging to the FilterDetect class. It logs to the office-wide
default
       logger.
     - Added a flush() method to the BinaryXOutputStream class.
     - Changed FilterDetect to use XMultiComponentFactory and
XComponentContext
       instead of the deprecated XMultiServiceFactory.
     - Error handling was generally improved.
     - Exception safety and some memory safety (::std::vector instead of
new[])
       in all the new code. Memory leaks should not be possible.

     Much of the code involved in the decryption was ported from the
excellent
     Apache POI project, so it's been credited in our NOTICE file.

     Patch by: me
---snip---


It took much longer than I expected:

The MS-OFFCRYPTO specification was unclear, and plain wrong in some parts,
eg. "SHA-1" in the spec but "SHA1" in actual OOXML documents; I've made our
code support both.

The "Standard" encryption from MS Office 2007 that we already supported was
itself a mess, and much work was needed to refactor and clean up that code
before the "Agile" encryption could also be added.

Then XML parsing had to be added, since Agile encryption specifies settings
in XML instead of binary like Standard encryption did. XML handling in
OpenOffice is pretty outdated, with no support for namespaces, but at least
the newer "FastParser" does support namespaces and is in fact very fast
because it converts strings to unique integers, and packs namespaces into
bit fields, for faster comparisons. I ended up updating the main/oox
FastParser to support the new Agile encryption namespaces and elements.

MS-OFFCRYPTO also only describes encryption, not decryption, and since we
can only read OOXML, only decryption matters.

Apache POI code was tremendously helpful in figuring out the decryption
process. Most of the decryption code I added was just ported directly from
theirs, and thus I've added Apache POI to our NOTICE file (please check
that I've done it correctly). Also several bugs were figured out by
simultaneously stepping through our code in gdb and their code in NetBeans,
and comparing respective values. A big thank you to the Apache POI
developers, whose OOXML support is still better than ours in many ways!

I used OpenSSL for all the message digest and encryption stuff, both
because our MD5 and SHA1 algorithms are broken (bug 127661), and because
Agile encryption requires many digests and ciphers that OpenSSL supports
but we don't.

Anyway, it works now. All encrypted OOXML files should work, eg. text
documents, spreadsheets, presentations, etc.

Other issues I am aware of:
- We only support password encrypted documents. Certificate encrypted
documents: not yet. ODF 1.3 also added certificate encryption, so maybe
that's something we should develop together.
- There are other variations of encryption we still don't support, eg. the
"Extensible" encryption, the "RC4 CryptoAPI" encryption, "XOR obfuscation",
etc. Apache POI would be a good source for those too. It's unclear to me
how widely those are used, and whether they are worth implementing.
- It may need to be rearchitected when we add OOXML writing.
- A lot of other required cleanups to our code were discovered, will
discuss those separately.

I've squashed all my work into a single commit, so it can be easily
cherry-picked to AOO42X and maybe even AOO41X when people are happy with it.

Regards
Damjan

Attachment: smime.p7s
Description: Kryptografische S/MIME-Signatur

Reply via email to