Copied: tika/site/src/site/apt/security-model.apt (from r1925951, tika/site/src/site/apt/security.apt) URL: http://svn.apache.org/viewvc/tika/site/src/site/apt/security-model.apt?p2=tika/site/src/site/apt/security-model.apt&p1=tika/site/src/site/apt/security.apt&r1=1925951&r2=1926757&rev=1926757&view=diff ============================================================================== --- tika/site/src/site/apt/security.apt (original) +++ tika/site/src/site/apt/security-model.apt Fri Jun 27 16:49:13 2025 @@ -1,5 +1,5 @@ ---------------- - Security + Security Model ---------------- ~~ Licensed to the Apache Software Foundation (ASF) under one or more @@ -17,220 +17,33 @@ ~~ See the License for the specific language governing permissions and ~~ limitations under the License. -Security +Security Model - The following is an incomplete list of known and fixed - Critical Vulnerabilities and Exposures (CVEs) and other - vulnerabilities in Apache Tika or its dependencies. Please - help us fill this in with more details. - - -*-------------*-------------*----------------*------------------* -|CVE or Vulnerability| Description | Reporter | Affected Versions| -*-------------*-------------*----------------*------------------* -| {{{https://nvd.nist.gov/vuln/detail/CVE-2023-42503} CVE-2023-42503}} - | commons-compress uncontrolled resource consumption vulnerability while parsing tar files| ??? | ???->2.9.0 | -*-------------*-------------*----------------*------------------* -| {{{https://lists.apache.org/thread/wfno8mf5nlcvbs78z93q9thgrm30wwfh} CVE-2022-33879}} - | Regex DoS in StandardsExtractingContentHandler; incomplete fix for CVE-2022-30973/CVE-2022-30216 and a new one | Tony Torralba, Jaroslav LobaÄevski and Tim Allison |???-2.4.0 and ???-1.28.3| -*-------------*-------------*----------------*------------------* -| {{{https://lists.apache.org/thread/gqvb5t4p7tmdpl0y5bdbf72pgxj04h7p} CVE-2022-30973}} - | Regex DoS in StandardsExtractingContentHandler; missed fix in 1.28.2 | Cathy Hu, SUSE Software Solutions Germany GmbH |???-1.28.2| -*-------------*-------------*----------------*------------------* -| {{{https://lists.apache.org/thread/t3tb51sf0k2pmbnzsrrrm23z9r1c10rk} CVE-2022-25169}} - | BPGParser Memory Usage DoS | ??? |???-2.3.0 and ???-1.28.1| -*-------------*-------------*----------------*------------------* -| {{{https://lists.apache.org/thread/dh3syg68nxogbmlg13srd6gjn3h2z6r4} CVE-2022-30216}} - | Regex DoS in StandardsExtractingContentHandler | CodeQL team members Tony Torralba and Joseph Farebrother |???-2.3.0 and ???-1.28.1| -*-------------*-------------*----------------*------------------* -| {{{https://nvd.nist.gov/vuln/detail/CVE-2021-44832} CVE-2021-44832}} - | Remote Code Execution via JDBC Appender in log4j2 | ??? |2.0.0-BETA-2.2.1| -*-------------*-------------*----------------*------------------* -| {{{https://nvd.nist.gov/vuln/detail/CVE-2021-44228} CVE-2021-44228}} - | Critical Remote Code Execution in log4j2 | ??? |2.0.0-BETA-2.1.0| -*-------------*-------------*----------------*------------------* -| {{{https://lists.apache.org/thread.html/ra2ab0ce69ce8aaff0773b8c1036438387ce004c2afc6f066626e205e%40%3Cusers.pdfbox.apache.org%3E} CVE-2021-31812}} - | Infinite loop when loading a crafted PDF in PDFBox before 2.0.24 | Chaoyuan Peng |?-1.26| -*-------------*-------------*----------------*------------------* -| {{{https://lists.apache.org/thread.html/re3bd16f0cc8f1fbda46b06a4b8241cd417f71402809baa81548fc20e%40%3Cusers.pdfbox.apache.org%3E} CVE-2021-31811}} - | OutOfMemoryException when loading a crafted PDF in PDFBox before 2.0.24 | Chaoyuan Peng |?-1.26| -*-------------*-------------*----------------*------------------* -| {{{https://lists.apache.org/thread.html/r915add4aa52c60d1b5cf085039cfa73a98d7fae9673374dfd7744b5a%40%3Cdev.tika.apache.org%3E} CVE-2021-28657}} - | Infinite loop in the MP3Parser.| Khaled Nassar |?-1.25| -*-------------*-------------*----------------*------------------* -| {{{https://lists.apache.org/thread.html/rf35026148ccc0e1af133501c0d003d052883fcc65107b3ff5d3b61cd%40%3Cusers.pdfbox.apache.org%3E}CVE-2021-27906}} - | Out of memory error while loading a file in PDFBox before 2.0.23.| Fabian Meumertzheim |?-1.25| -*-------------*-------------*----------------*------------------* -| {{{https://lists.apache.org/thread.html/r4717f902f8bc36d47b3fa978552a25e4ed3ddc2fffb52b94fbc4ab36%40%3Cusers.pdfbox.apache.org%3E} CVE-2021-27807}} - | Infinite loop while loading a file in PDFBox before 2.0.23.| Fabian Meumertzheim |?-1.25| -*-------------*-------------*----------------*------------------* -| {{{https://lists.apache.org/thread.html/r4d943777e36ca3aa6305a45da5acccc54ad894f2d5a07186cfa2442c%40%3Cdev.tika.apache.org%3E} CVE-2020-9489}} - | System.exit vulnerability in Tika's OneNote Parser; out of memory errors and/or infinite loops in Tika's ICNSParser, MP3Parser, MP4Parser, SAS7BDATParser, OneNoteParser and ImageParser.| Tim Allison |1.0-1.24| -*-------------*-------------*----------------*------------------* -| {{{https://lists.apache.org/thread.html/r463b1a67817ae55fe022536edd6db34e8f9636971188430cbcf8a8dd%40%3Cdev.tika.apache.org%3E} CVE-2020-1950}} - | Excessive memory usage (DoS) vulnerability in Apache Tika's PSDParser |Pierre Ernst |1.0-1.23| -*-------------*-------------*----------------*------------------* -| {{{https://lists.apache.org/thread.html/rd8c1b42bd0e31870d804890b3f00b13d837c528f7ebaf77031323172%40%3Cdev.tika.apache.org%3E} CVE-2020-1951}} - | Infinite Loop (DoS) vulnerability in Apache Tika's PSDParser |Tim Allison |1.0-1.23| -*-------------*-------------*----------------*------------------* -| {{{https://lists.apache.org/thread.html/fe876a649d9d36525dd097fe87ff4dcb3b82bb0fbb3a3d71fb72ef61@%3Cdev.tika.apache.org%3E} CVE-2019-10094}} - | StackOverflow from Crafted Package/Compressed Files in Apache Tika's RecursiveParserWrapper|Tim Allison; files contributed by Matthew Barber and Erling Ellingsen |1.7-1.21| -*-------------*-------------*----------------*------------------* -| {{{https://lists.apache.org/thread.html/a5a44eff1b9eda3bc69d22943a1030c43d376380c75d3ab04d0c1a21@%3Cdev.tika.apache.org%3E} CVE-2019-10093}} - | Denial of Service in Apache Tika's 2003ml and 2006ml Parsers|Tim Allison|1.19-1.21| -*-------------*-------------*----------------*------------------* -| {{{https://lists.apache.org/thread.html/1c63555609b737c20d1bbfa4a3e73ec488e3408a84e2f5e47e1b7e08@%3Cdev.tika.apache.org%3E} CVE-2019-10088}} - | OOM from a crafted Zip File in Apache Tika's RecursiveParserWrapper|RunningSnail|1.7-1.21| -*-------------*-------------*----------------*------------------* -| {{{https://issues.apache.org/jira/browse/PDFBOX-4550} PDFBOX-4550}} - | OOM from corrupt ToUnicode stream in PDFs|Tilman Hausherr|?-1.21| -*-------------*-------------*----------------*------------------* -| {{{https://nvd.nist.gov/vuln/detail/CVE-2019-0228} CVE-2019-0228}} - | XML External Entity (XXE) in xfdf loading in PDFBox (regular Tika parsing would likely not be vulnerable) |Kurt Boberg|?-1.20| -*-------------*-------------*----------------*------------------* -| {{{https://nvd.nist.gov/vuln/detail/CVE-2018-20346} CVE-2018-20346}} - | (Provided) SQLite before 3.52.3 allows remote attackers to execute arbitrary code|Pat Cashman (notified Tika team)|?-1.20| -*-------------*-------------*----------------*------------------* -| {{{https://lists.apache.org/thread.html/7c021a4ea2037e52e74628e17e8e0e2acab1f447160edc8be0eae6d3@%3Cdev.tika.apache.org%3E}CVE-2018-17197}} - | Infinite Loop in Tika's SQLite3Parser |Tim Allison |1.8-1.19.1| -*-------------*-------------*----------------*------------------* -| {{{https://lists.apache.org/thread.html/88de8350cda9b184888ec294c813c5bd8a2081de8fd3666f8904bc05@%3Cdev.tika.apache.org%3E}CVE-2018-11796}} - | XML Entity Expansion in Tika's SAXParsers after reset() |Slava Gorelik |?-1.19| -*-------------*-------------*----------------*------------------* -| {{{https://lists.apache.org/thread.html/b7eb142436d2620646d1da087ca004159241d3930a9463b476700a4d@%3Cdev.pdfbox.apache.org%3E}CVE-2018-11797}} - | Very long loop parsing page tree in PDFBox |Shawn Rasheed and Jens Dietrich |?-1.19| -*-------------*-------------*----------------*------------------* -| {{{http://mail-archives.us.apache.org/mod_mbox/www-announce/201808.mbox/%[email protected]%3E}CVE-2018-11771}} - | Infinite Loop in Commons-Compress ZipArchiveInputStream |Tobias Ospelt |?-1.18| -*-------------*-------------*----------------*------------------* -| {{{https://lists.apache.org/thread.html/72df7a3f0dda49a912143a1404b489837a11f374dfd1961061873a91@%3Cdev.tika.apache.org%3E}CVE-2018-8017}} - | Infinite Loop in IptcAnpaParser|Rohan Padhye and Tobias Ospelt |1.2-1.18| -*-------------*-------------*----------------*------------------* -| {{{https://lists.apache.org/thread.html/9f62f742fd4fcd81654a9533b8a71349b064250840592bcd502dcfb6@%3Cusers.pdfbox.apache.org%3E}CVE-2018-8036}} - | Infinite Loop leading to OOM in PDFBox's AFMParser|Tobias Ospelt |?-1.18| -*-------------*-------------*----------------*------------------* -| {{{https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2018-12418}CVE-2018-12418}} - | Infinite Loop in junrar|Tobias Ospelt |?-1.18| -*-------------*-------------*----------------*------------------* -| {{{https://lists.apache.org/thread.html/5553e10bba5604117967466618f219c0cae710075819c70cfb3fb421@%3Cdev.tika.apache.org%3E}CVE-2018-11761}} - | XML Entity Expansion Vulnerability|Renfei (Brian) Wang |0.1-1.18| -*-------------*-------------*----------------*------------------* -| {{{https://lists.apache.org/thread.html/ab2e1af38975f5fc462ba89b517971ef892ec3d06bee12ea2258895b@%3Cdev.tika.apache.org%3E}CVE-2018-11762}} - | Rare Zip Slip Vulnerability in tika-app|Tim Allison |0.9-1.18| -*-------------*-------------*----------------*------------------* -| {{{http://mail.openjdk.java.net/pipermail/sound-dev/2015-September/000349.html}RIFFReader}} - | Infinite Loop in AudioParser in Java 8 and 9|Sergey Bylokhov and Tobias Ospelt |?-1.18| -*-------------*-------------*----------------*------------------* -| {{{https://issues.apache.org/jira/browse/TIKA-2446}TIKA-2446}} - | OOM detecting OPCPackage files with corrupt ZIP|Thorsten Schäfer |?-1.18| -*-------------*-------------*----------------*------------------* -| {{{https://issues.apache.org/jira/browse/PDFBOX-4014}PDFBOX-4014}} - | Infinite loop in JBig2 (versions less than 3.0.0) | Hanno Böck | (if user supplied) ?-1.17| -*-------------*-------------*----------------*------------------* -| {{{https://www.cvedetails.com/cve/CVE-2018-1339}CVE-2018-1339}} - | Infinite loop in ChmParser|Tobias Ospelt |?-1.17| -*-------------*-------------*----------------*------------------* -|{{{https://www.cvedetails.com/cve/CVE-2018-1338}CVE-2018-1338}} - | Infinite loop in BPGParser| Tobias Ospelt | ?-1.17| -*-------------*-------------*----------------*------------------* -|{{{http://mail-archives.apache.org/mod_mbox/www-announce/201804.mbox/%3CCAC1dCwVhrPRyFJMS5BbY02%2B495CUODrAzndqZkvKacJnXUSm%2Bw%40mail.gmail.com%3E}CVE-2018-1335}} - | Command Execution in tika-server | Tim Allison | ?-1.17| -*-------------*-------------*----------------*------------------* -|{{{https://www.cvedetails.com/cve/CVE-2017-12626}CVE-2017-12626}} - | Apache POI - Infinite loops in WMF, EMF, MSG and macros; OOMs in DOC, PPT and XLS | Tim Allison, LuÃs Filipe Nassif and Jerome Lacoste| ?-1.17| -*-------------*-------------*----------------*------------------* -|{{{https://nvd.nist.gov/vuln/detail/CVE-2018-1324}CVE-2018-1324}} and {{{https://issues.apache.org/jira/browse/COMPRESS-432}COMPRESS-432}} - | Commons Compress - Infinite loop in ZipFile | LuÃs Filipe Nassif and Anton Abashkin | ?-1.17| -*-------------*-------------*----------------*------------------* -|{{{https://www.cvedetails.com/cve/CVE-2018-7489/}CVE-2018-7489}} -and {{{https://issues.apache.org/jira/browse/TIKA-2634}TIKA-2634}} - | Jackson - Deserialization vulnerability | Richard Cyganiak (notified Tika team) | ?-1.17| -*-------------*-------------*----------------*------------------* -|{{{https://issues.apache.org/jira/browse/PDFBOX-3919}PDFBOX-3919}} - | Apache PDFBox - Infinite loop | Hanno Böck and Andreas Bogk | ?-1.16| -*-------------*-------------*----------------*------------------* -|{{{https://issues.apache.org/jira/browse/TIKA-2115}TIKA-2115}} - | Apache POI - OOM parsing OLE object| Thomas Galla | ?-1.15| -*-------------*-------------*----------------*------------------* -|{{{https://issues.apache.org/jira/browse/COMPRESS-382}COMPRESS-382}} - | Commons Compress - OOM detecting corrupt LZMA | LuÃs Filipe Nassif | ?-1.15| -*-------------*-------------*----------------*------------------* -|{{{https://issues.apache.org/jira/browse/COMPRESS-386}COMPRESS-386}} -and -{{{https://issues.apache.org/jira/browse/TIKA-1631}TIKA-1631}} - | Commons Compress - OOM detecting corrupt x-compress | Pavel Micka | ?-1.15| -*-------------*-------------*----------------*------------------* -|{{{https://issues.apache.org/jira/browse/TIKA-2045}TIKA-2045}} and - {{{https://issues.apache.org/jira/browse/PDFBOX-3442}TIKA-3442}} - | Apache PDFBox - OOM in font caching | Egbert | ?-1.13| -*-------------*-------------*----------------*------------------* -|{{{https://issues.apache.org/jira/browse/TIKA-1866}TIKA-1866}} and - {{{https://issues.apache.org/jira/browse/TIKA-954}TIKA-954}} - | Apache POI - OOM in DOCX and PPTX because of bug in Piccolo parser| Rob Tulloh and Shawn Johnson | ?-1.13| -*-------------*-------------*----------------*------------------* -|{{{https://issues.apache.org/jira/browse/TIKA-2040}TIKA-2040}} - | GC-Overload and OOM in CHMParser| LuÃs Filipe Nassif | ?-1.13| -*-------------*-------------*----------------*------------------* -|{{{https://www.cvedetails.com/cve/CVE-2016-6809}CVE-2016-6809}} - | jmatio - Deserialization Vulnerability in MATLAB parser| Pierre Ernst | 1.6-1.13| -*-------------*-------------*----------------*------------------* -|{{{https://www.cvedetails.com/cve/CVE-2016-4434}CVE-2016-4434}} - | XXE Vulnerability in several parsers | Arthur Khashaev, Seulgi Kim, Mesut Timur (and Tim Allison while remediating initial issue reported by Arthur et al.)| 0.10-1.12| -*-------------*-------------*----------------*------------------* -|{{{https://nvd.nist.gov/vuln/detail/CVE-2016-2175}CVE-2016-2175}} - | XML External Entity (XXE) in PDFBox | ???| ?-1.12| -*-------------*-------------*----------------*------------------* -|{{{https://www.cvedetails.com/cve/CVE-2015-3271}CVE-2015-3271}} - | Remote Access to host files via tika-server| Tim Allison | 1.9?-1.10| -*-------------*-------------*----------------*------------------* -|{{{https://issues.apache.org/jira/browse/PDFBOX-2811}PDFBOX-2811}} - | Apache PDFBox - Infinite Loop| Andreas Lehmkühler | ?-1.10| -*-------------*-------------*----------------*------------------* -|{{{https://issues.apache.org/jira/browse/PDFBOX-2200}PDFBOX-2200}} - | Apache PDFBox - Slowly building memory leak because of static caching of fonts| Matthew Buckett | ?-1.6| -*-------------*-------------*----------------*------------------* -|{{{https://issues.apache.org/jira/browse/TIKA-1471}TIKA-1471}} - | Apache PDFBox - OOM with corrupt PDF| Alan Burlison | ?-1.6| -*-------------*-------------*----------------*------------------* -|{{{https://issues.apache.org/jira/browse/TIKA-788}TIKA-788}} - | Infinite Loop in DWG | Stas Shaposhnikov | ?-1.4?| -*-------------*-------------*----------------*------------------* -|{{{https://issues.apache.org/jira/browse/TIKA-1132}TIKA-1132}} - | Apache POI - Nearly Infinite Loop in XLS| Ryan Krueger | ?-1.4| -*-------------*-------------*----------------*------------------* -|{{{https://issues.apache.org/jira/browse/TIKA-1179}TIKA-1179}} - | Infinite Loop in corrupt MP3| Marius Dumitru Florea| ?-1.4| -*-------------*-------------*----------------*------------------* -|{{{https://issues.apache.org/jira/browse/TIKA-866}TIKA-866}} - | OOM reading Tika config file| Stephan Mühlstrasser | ?-1.1| -*-------------*-------------*----------------*------------------* - - - Third party vulnerabilities that may or may not be triggerable via regular - use of Apache Tika. - -*-------------*-------------*----------------*------------------* -|CVE or Vulnerability| Description | Reporter | Affected Versions| -*-------------*-------------*----------------*------------------* -| {{{https://nvd.nist.gov/vuln/detail//CVE-2018-10237} CVE-2018-10237}} - | Unbounded memory allocation in Google Guava|Pat Cashman (notified Tika team)|?-1.20| -*-------------*-------------*----------------*------------------* -|{{{https://nvd.nist.gov/vuln/detail/CVE-2018-19362}CVE-2018-19362}} - |FaxterXML jackson-databind may allow attackers to have unspecified impact from polymorphic deserialization |Pat Cashman (notified Tika team)| ?-1.20| -*-------------*-------------*----------------*------------------* - -Acronyms and Terms - - * Command Execution -- A malicious client could execute anything on tika-server's commandline + Parsing is dangerous. Bad things can happen when parsing untrusted data. Apache Tika is primarily designed to + work with trusted/sanitized data. Users are responsible for handling crashes and other consequences from + parsing untrusted data. See {{{https://cwiki.apache.org/confluence/display/TIKA/The+Robustness+of+Apache+Tika} the Robustness of Apache Tika}} + for guidance on how to run Tika more safely. - * Deserialization Vulnerability -- {{{https://www.owasp.org/index.php/Deserialization_Cheat_Sheet}OWASP's - Cheat Sheet}}. A malicious actor could run arbitrary code on your computer. + Further, mime detection and content extraction are both inherently challenging and prone to errors. + We advise against trusting without verification either mime detection or content extraction in high risk + applications such as, for example, cross-domain filtering or search. - * OOM -- Out of Memory Error -- Parsers may allocate more memory than is available. This can sometimes be caused - by parsers not performing sanity checks before allocation. See, for example: {{{https://issues.apache.org/jira/browse/TIKA-1631}TIKA-1631}} + Tika is not designed to identify or render safe files that are crafted to create parser differentials + (such as with polyglots, chimeras, schizophrenic files or ...). - * XXE -- {{{https://www.owasp.org/index.php/XML_External_Entity_(XXE)_Processing} - XML External Entity Processing}} A malicious client could access data on your system. + Files can be crafted to evade detection, hinder analysis or otherwise cause mayhem in countless ways. + Running {{{https://cwiki.apache.org/confluence/display/TIKA/TikaServer}} tika-server} adds its own security risks. + Depending on the settings and what modules are loaded ({{{https://cwiki.apache.org/confluence/display/TIKA/tika-pipes} tika-pipes}}, for example), + it is possible to grant read and write access at the same level as the user running the application. + We strongly encourage defense in depth with tika-server, including isolating access to its endpoints, setting up two-way TLS, + and limiting its user permissions. + The project makes every effort to prevent Denial of Service attacks and other software vulnerabilities, + and we welcome reports and example proof-of-concept files. Some Denial of Service attacks are not easily fixed, + and users need to take precautions when parsing untrusted data. + + We welcome suggestions and pull requests for hardening the code base. + + See our {{{/security.html} Security page}} for fixed vulnerabilities.
Modified: tika/site/src/site/apt/security.apt URL: http://svn.apache.org/viewvc/tika/site/src/site/apt/security.apt?rev=1926757&r1=1926756&r2=1926757&view=diff ============================================================================== --- tika/site/src/site/apt/security.apt (original) +++ tika/site/src/site/apt/security.apt Fri Jun 27 16:49:13 2025 @@ -24,6 +24,7 @@ Security vulnerabilities in Apache Tika or its dependencies. Please help us fill this in with more details. + See also see our {{{/security-model.html} security model}}. *-------------*-------------*----------------*------------------* |CVE or Vulnerability| Description | Reporter | Affected Versions|
