All,
I recently noticed that our info here [0] is out of date. I think we should
update that page to reflect Tika as a top level project. Also, I suspect that
when we initially entered our info on that site and sent notification to
BIS/NSA (TIKA-118), we were only handling encryption for PDFs. So, I think we
should also update the links to include the source repositories for other
dependencies that rely on encryption.
From a quick review, my current understanding of the parsers that use
encryption in Tika:
a) PDFParser: has own RC4Cipher (for writing...not used by Tika but probably
bundled) and relies on Bouncy Castle otherwise
b) POI: can rely on Bouncy Castle but also uses its own encryption algorithms
c) JackcessParser, relies on jackcess-encrypt package which relies on Bouncy
Castle
d) PKCs7Parser: relies on Bouncy Castle directly (CMSSignedDataParser)
e) PkgParser: relies on Apache Commons Compress' SevenZFile which uses
javax.crypto package
Is the above info correct? Any others?
Given the changes in our code and our dependencies, I figure that we may as
well update/resend our notification to BIS/NSA [1].
Does this sound reasonable? I'll open a ticket if so.
Best,
Tim
[0] http://www.apache.org/licenses/exports/
[1] http://www.apache.org/dev/crypto.html#notify