Copied: tika/site/src/site/apt/security-model.apt (from r1925951, 
tika/site/src/site/apt/security.apt)
URL: 
http://svn.apache.org/viewvc/tika/site/src/site/apt/security-model.apt?p2=tika/site/src/site/apt/security-model.apt&p1=tika/site/src/site/apt/security.apt&r1=1925951&r2=1926757&rev=1926757&view=diff
==============================================================================
--- tika/site/src/site/apt/security.apt (original)
+++ tika/site/src/site/apt/security-model.apt Fri Jun 27 16:49:13 2025
@@ -1,5 +1,5 @@
                           ----------------
-                              Security
+                              Security Model
                           ----------------
 
 ~~ Licensed to the Apache Software Foundation (ASF) under one or more
@@ -17,220 +17,33 @@
 ~~ See the License for the specific language governing permissions and
 ~~ limitations under the License.
 
-Security
+Security Model
 
-   The following is an incomplete list of known and fixed
-   Critical Vulnerabilities and Exposures (CVEs) and other
-   vulnerabilities in Apache Tika or its dependencies.  Please
-   help us fill this in with more details.
-
-
-*-------------*-------------*----------------*------------------*
-|CVE or Vulnerability| Description | Reporter       | Affected Versions|
-*-------------*-------------*----------------*------------------*
-| {{{https://nvd.nist.gov/vuln/detail/CVE-2023-42503} CVE-2023-42503}}
-              | commons-compress uncontrolled resource consumption 
vulnerability while parsing tar files| ???  | ???->2.9.0 |
-*-------------*-------------*----------------*------------------*
-| {{{https://lists.apache.org/thread/wfno8mf5nlcvbs78z93q9thgrm30wwfh} 
CVE-2022-33879}}
-              | Regex DoS in StandardsExtractingContentHandler; incomplete fix 
for CVE-2022-30973/CVE-2022-30216 and a new one | Tony Torralba, Jaroslav 
Lobačevski and Tim Allison  |???-2.4.0 and ???-1.28.3|
-*-------------*-------------*----------------*------------------*
-| {{{https://lists.apache.org/thread/gqvb5t4p7tmdpl0y5bdbf72pgxj04h7p} 
CVE-2022-30973}}
-              | Regex DoS in StandardsExtractingContentHandler; missed fix in 
1.28.2 | Cathy Hu, SUSE Software Solutions Germany GmbH |???-1.28.2|
-*-------------*-------------*----------------*------------------*
-| {{{https://lists.apache.org/thread/t3tb51sf0k2pmbnzsrrrm23z9r1c10rk} 
CVE-2022-25169}}
-              | BPGParser Memory Usage DoS | ??? |???-2.3.0 and ???-1.28.1|
-*-------------*-------------*----------------*------------------*
-| {{{https://lists.apache.org/thread/dh3syg68nxogbmlg13srd6gjn3h2z6r4} 
CVE-2022-30216}}
-              | Regex DoS in StandardsExtractingContentHandler | CodeQL team 
members Tony Torralba and Joseph Farebrother |???-2.3.0 and ???-1.28.1|
-*-------------*-------------*----------------*------------------*
-| {{{https://nvd.nist.gov/vuln/detail/CVE-2021-44832} CVE-2021-44832}}
-              | Remote Code Execution via JDBC Appender in log4j2 | ??? 
|2.0.0-BETA-2.2.1|
-*-------------*-------------*----------------*------------------*
-| {{{https://nvd.nist.gov/vuln/detail/CVE-2021-44228} CVE-2021-44228}}
-              | Critical Remote Code Execution in log4j2 | ??? 
|2.0.0-BETA-2.1.0|
-*-------------*-------------*----------------*------------------*
-| 
{{{https://lists.apache.org/thread.html/ra2ab0ce69ce8aaff0773b8c1036438387ce004c2afc6f066626e205e%40%3Cusers.pdfbox.apache.org%3E}
 CVE-2021-31812}}
-              | Infinite loop when loading a crafted PDF in PDFBox before 
2.0.24 | Chaoyuan Peng |?-1.26|
-*-------------*-------------*----------------*------------------*
-| 
{{{https://lists.apache.org/thread.html/re3bd16f0cc8f1fbda46b06a4b8241cd417f71402809baa81548fc20e%40%3Cusers.pdfbox.apache.org%3E}
 CVE-2021-31811}}
-              | OutOfMemoryException when loading a crafted PDF in PDFBox 
before 2.0.24 | Chaoyuan Peng |?-1.26|
-*-------------*-------------*----------------*------------------*
-| 
{{{https://lists.apache.org/thread.html/r915add4aa52c60d1b5cf085039cfa73a98d7fae9673374dfd7744b5a%40%3Cdev.tika.apache.org%3E}
 CVE-2021-28657}}
-              | Infinite loop in the MP3Parser.| Khaled Nassar |?-1.25|
-*-------------*-------------*----------------*------------------*
-| 
{{{https://lists.apache.org/thread.html/rf35026148ccc0e1af133501c0d003d052883fcc65107b3ff5d3b61cd%40%3Cusers.pdfbox.apache.org%3E}CVE-2021-27906}}
-              | Out of memory error while loading a file in PDFBox before 
2.0.23.| Fabian Meumertzheim |?-1.25|
-*-------------*-------------*----------------*------------------*
-| 
{{{https://lists.apache.org/thread.html/r4717f902f8bc36d47b3fa978552a25e4ed3ddc2fffb52b94fbc4ab36%40%3Cusers.pdfbox.apache.org%3E}
 CVE-2021-27807}}
-              | Infinite loop while loading a file in PDFBox before 2.0.23.| 
Fabian Meumertzheim |?-1.25|
-*-------------*-------------*----------------*------------------*
-| 
{{{https://lists.apache.org/thread.html/r4d943777e36ca3aa6305a45da5acccc54ad894f2d5a07186cfa2442c%40%3Cdev.tika.apache.org%3E}
 CVE-2020-9489}}
-              | System.exit vulnerability in Tika's OneNote Parser; out of 
memory errors and/or infinite loops in Tika's ICNSParser, MP3Parser, MP4Parser, 
SAS7BDATParser, OneNoteParser and ImageParser.| Tim Allison |1.0-1.24|
-*-------------*-------------*----------------*------------------*
-| 
{{{https://lists.apache.org/thread.html/r463b1a67817ae55fe022536edd6db34e8f9636971188430cbcf8a8dd%40%3Cdev.tika.apache.org%3E}
 CVE-2020-1950}}
-              | Excessive memory usage (DoS) vulnerability in Apache Tika's 
PSDParser |Pierre Ernst |1.0-1.23|
-*-------------*-------------*----------------*------------------*
-| 
{{{https://lists.apache.org/thread.html/rd8c1b42bd0e31870d804890b3f00b13d837c528f7ebaf77031323172%40%3Cdev.tika.apache.org%3E}
 CVE-2020-1951}}
-              | Infinite Loop (DoS) vulnerability in Apache Tika's PSDParser 
|Tim Allison |1.0-1.23|
-*-------------*-------------*----------------*------------------*
-| 
{{{https://lists.apache.org/thread.html/fe876a649d9d36525dd097fe87ff4dcb3b82bb0fbb3a3d71fb72ef61@%3Cdev.tika.apache.org%3E}
 CVE-2019-10094}}
-              | StackOverflow from Crafted Package/Compressed Files in Apache 
Tika's RecursiveParserWrapper|Tim Allison; files contributed by Matthew Barber 
and Erling Ellingsen |1.7-1.21|
-*-------------*-------------*----------------*------------------*
-| 
{{{https://lists.apache.org/thread.html/a5a44eff1b9eda3bc69d22943a1030c43d376380c75d3ab04d0c1a21@%3Cdev.tika.apache.org%3E}
 CVE-2019-10093}}
-              | Denial of Service in Apache Tika's 2003ml and 2006ml 
Parsers|Tim Allison|1.19-1.21|
-*-------------*-------------*----------------*------------------*
-| 
{{{https://lists.apache.org/thread.html/1c63555609b737c20d1bbfa4a3e73ec488e3408a84e2f5e47e1b7e08@%3Cdev.tika.apache.org%3E}
 CVE-2019-10088}}
-              | OOM from a crafted Zip File in Apache Tika's 
RecursiveParserWrapper|RunningSnail|1.7-1.21|
-*-------------*-------------*----------------*------------------*
-| {{{https://issues.apache.org/jira/browse/PDFBOX-4550} PDFBOX-4550}}
-              | OOM from corrupt ToUnicode stream in PDFs|Tilman 
Hausherr|?-1.21|
-*-------------*-------------*----------------*------------------*
-| {{{https://nvd.nist.gov/vuln/detail/CVE-2019-0228} CVE-2019-0228}}
-              | XML External Entity (XXE) in xfdf loading in PDFBox (regular 
Tika parsing would likely not be vulnerable) |Kurt Boberg|?-1.20|
-*-------------*-------------*----------------*------------------*
-| {{{https://nvd.nist.gov/vuln/detail/CVE-2018-20346} CVE-2018-20346}}
-              | (Provided) SQLite before 3.52.3 allows remote attackers to 
execute arbitrary code|Pat Cashman (notified Tika team)|?-1.20|
-*-------------*-------------*----------------*------------------*
-| 
{{{https://lists.apache.org/thread.html/7c021a4ea2037e52e74628e17e8e0e2acab1f447160edc8be0eae6d3@%3Cdev.tika.apache.org%3E}CVE-2018-17197}}
-              | Infinite Loop in Tika's SQLite3Parser |Tim Allison |1.8-1.19.1|
-*-------------*-------------*----------------*------------------*
-| 
{{{https://lists.apache.org/thread.html/88de8350cda9b184888ec294c813c5bd8a2081de8fd3666f8904bc05@%3Cdev.tika.apache.org%3E}CVE-2018-11796}}
-              | XML Entity Expansion in Tika's SAXParsers after reset() |Slava 
Gorelik |?-1.19|
-*-------------*-------------*----------------*------------------*
-| 
{{{https://lists.apache.org/thread.html/b7eb142436d2620646d1da087ca004159241d3930a9463b476700a4d@%3Cdev.pdfbox.apache.org%3E}CVE-2018-11797}}
-              | Very long loop parsing page tree in PDFBox |Shawn Rasheed and 
Jens Dietrich  |?-1.19|
-*-------------*-------------*----------------*------------------*
-| 
{{{http://mail-archives.us.apache.org/mod_mbox/www-announce/201808.mbox/%[email protected]%3E}CVE-2018-11771}}
-              | Infinite Loop in Commons-Compress ZipArchiveInputStream 
|Tobias Ospelt  |?-1.18|
-*-------------*-------------*----------------*------------------*
-| 
{{{https://lists.apache.org/thread.html/72df7a3f0dda49a912143a1404b489837a11f374dfd1961061873a91@%3Cdev.tika.apache.org%3E}CVE-2018-8017}}
-              | Infinite Loop in IptcAnpaParser|Rohan Padhye and Tobias Ospelt 
|1.2-1.18|
-*-------------*-------------*----------------*------------------*
-| 
{{{https://lists.apache.org/thread.html/9f62f742fd4fcd81654a9533b8a71349b064250840592bcd502dcfb6@%3Cusers.pdfbox.apache.org%3E}CVE-2018-8036}}
-              | Infinite Loop leading to OOM in PDFBox's AFMParser|Tobias 
Ospelt  |?-1.18|
-*-------------*-------------*----------------*------------------*
-| 
{{{https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2018-12418}CVE-2018-12418}}
-              | Infinite Loop in junrar|Tobias Ospelt  |?-1.18|
-*-------------*-------------*----------------*------------------*
-| 
{{{https://lists.apache.org/thread.html/5553e10bba5604117967466618f219c0cae710075819c70cfb3fb421@%3Cdev.tika.apache.org%3E}CVE-2018-11761}}
-              | XML Entity Expansion Vulnerability|Renfei (Brian) Wang  
|0.1-1.18|
-*-------------*-------------*----------------*------------------*
-| 
{{{https://lists.apache.org/thread.html/ab2e1af38975f5fc462ba89b517971ef892ec3d06bee12ea2258895b@%3Cdev.tika.apache.org%3E}CVE-2018-11762}}
-              | Rare Zip Slip Vulnerability in tika-app|Tim Allison  |0.9-1.18|
-*-------------*-------------*----------------*------------------*
-| 
{{{http://mail.openjdk.java.net/pipermail/sound-dev/2015-September/000349.html}RIFFReader}}
-              | Infinite Loop in AudioParser in Java 8 and 9|Sergey Bylokhov 
and Tobias Ospelt  |?-1.18|
-*-------------*-------------*----------------*------------------*
-| {{{https://issues.apache.org/jira/browse/TIKA-2446}TIKA-2446}}
-              | OOM detecting OPCPackage files with corrupt ZIP|Thorsten 
Schäfer  |?-1.18|
-*-------------*-------------*----------------*------------------*
-| {{{https://issues.apache.org/jira/browse/PDFBOX-4014}PDFBOX-4014}}
-              | Infinite loop in JBig2 (versions less than 3.0.0) | Hanno 
Böck | (if user supplied) ?-1.17|
-*-------------*-------------*----------------*------------------*
-| {{{https://www.cvedetails.com/cve/CVE-2018-1339}CVE-2018-1339}}
-              | Infinite loop in ChmParser|Tobias Ospelt  |?-1.17|
-*-------------*-------------*----------------*------------------*
-|{{{https://www.cvedetails.com/cve/CVE-2018-1338}CVE-2018-1338}}
-      | Infinite loop in BPGParser| Tobias Ospelt | ?-1.17|
-*-------------*-------------*----------------*------------------*
-|{{{http://mail-archives.apache.org/mod_mbox/www-announce/201804.mbox/%3CCAC1dCwVhrPRyFJMS5BbY02%2B495CUODrAzndqZkvKacJnXUSm%2Bw%40mail.gmail.com%3E}CVE-2018-1335}}
-      | Command Execution in tika-server | Tim Allison | ?-1.17|
-*-------------*-------------*----------------*------------------*
-|{{{https://www.cvedetails.com/cve/CVE-2017-12626}CVE-2017-12626}}
-      | Apache POI - Infinite loops in WMF, EMF, MSG and macros; OOMs in DOC, 
PPT and XLS | Tim Allison, Luís Filipe Nassif and Jerome Lacoste| ?-1.17|
-*-------------*-------------*----------------*------------------*
-|{{{https://nvd.nist.gov/vuln/detail/CVE-2018-1324}CVE-2018-1324}} and 
{{{https://issues.apache.org/jira/browse/COMPRESS-432}COMPRESS-432}}
-      | Commons Compress - Infinite loop in ZipFile | Luís Filipe Nassif and 
Anton Abashkin | ?-1.17|
-*-------------*-------------*----------------*------------------*
-|{{{https://www.cvedetails.com/cve/CVE-2018-7489/}CVE-2018-7489}}
-and {{{https://issues.apache.org/jira/browse/TIKA-2634}TIKA-2634}}
-      | Jackson - Deserialization vulnerability | Richard Cyganiak (notified 
Tika team) | ?-1.17|
-*-------------*-------------*----------------*------------------*
-|{{{https://issues.apache.org/jira/browse/PDFBOX-3919}PDFBOX-3919}}
-      | Apache PDFBox - Infinite loop | Hanno Böck and Andreas Bogk | ?-1.16|
-*-------------*-------------*----------------*------------------*
-|{{{https://issues.apache.org/jira/browse/TIKA-2115}TIKA-2115}}
-      | Apache POI - OOM parsing OLE object| Thomas Galla | ?-1.15|
-*-------------*-------------*----------------*------------------*
-|{{{https://issues.apache.org/jira/browse/COMPRESS-382}COMPRESS-382}}
-      | Commons Compress - OOM detecting corrupt LZMA | Luís Filipe Nassif | 
?-1.15|
-*-------------*-------------*----------------*------------------*
-|{{{https://issues.apache.org/jira/browse/COMPRESS-386}COMPRESS-386}}
-and
-{{{https://issues.apache.org/jira/browse/TIKA-1631}TIKA-1631}}
-      | Commons Compress - OOM detecting corrupt x-compress | Pavel Micka | 
?-1.15|
-*-------------*-------------*----------------*------------------*
-|{{{https://issues.apache.org/jira/browse/TIKA-2045}TIKA-2045}} and
-  {{{https://issues.apache.org/jira/browse/PDFBOX-3442}TIKA-3442}}
-            | Apache PDFBox - OOM in font caching | Egbert | ?-1.13|
-*-------------*-------------*----------------*------------------*
-|{{{https://issues.apache.org/jira/browse/TIKA-1866}TIKA-1866}} and
-  {{{https://issues.apache.org/jira/browse/TIKA-954}TIKA-954}}
-            | Apache POI - OOM in DOCX and PPTX because of bug in Piccolo 
parser| Rob Tulloh and Shawn Johnson | ?-1.13|
-*-------------*-------------*----------------*------------------*
-|{{{https://issues.apache.org/jira/browse/TIKA-2040}TIKA-2040}}
-            | GC-Overload and OOM in CHMParser| Luís Filipe Nassif | ?-1.13|
-*-------------*-------------*----------------*------------------*
-|{{{https://www.cvedetails.com/cve/CVE-2016-6809}CVE-2016-6809}}
-            | jmatio - Deserialization Vulnerability in MATLAB parser| Pierre 
Ernst | 1.6-1.13|
-*-------------*-------------*----------------*------------------*
-|{{{https://www.cvedetails.com/cve/CVE-2016-4434}CVE-2016-4434}}
-            | XXE Vulnerability in several parsers | Arthur Khashaev, Seulgi 
Kim, Mesut Timur (and Tim Allison while remediating initial issue reported by 
Arthur et al.)| 0.10-1.12|
-*-------------*-------------*----------------*------------------*
-|{{{https://nvd.nist.gov/vuln/detail/CVE-2016-2175}CVE-2016-2175}}
-            | XML External Entity (XXE) in PDFBox | ???| ?-1.12|
-*-------------*-------------*----------------*------------------*
-|{{{https://www.cvedetails.com/cve/CVE-2015-3271}CVE-2015-3271}}
-            | Remote Access to host files via tika-server| Tim Allison | 
1.9?-1.10|
-*-------------*-------------*----------------*------------------*
-|{{{https://issues.apache.org/jira/browse/PDFBOX-2811}PDFBOX-2811}}
-            | Apache PDFBox - Infinite Loop| Andreas Lehmkühler | ?-1.10|
-*-------------*-------------*----------------*------------------*
-|{{{https://issues.apache.org/jira/browse/PDFBOX-2200}PDFBOX-2200}}
-            | Apache PDFBox - Slowly building memory leak because of static 
caching of fonts| Matthew Buckett | ?-1.6|
-*-------------*-------------*----------------*------------------*
-|{{{https://issues.apache.org/jira/browse/TIKA-1471}TIKA-1471}}
-            | Apache PDFBox - OOM with corrupt PDF| Alan Burlison | ?-1.6|
-*-------------*-------------*----------------*------------------*
-|{{{https://issues.apache.org/jira/browse/TIKA-788}TIKA-788}}
-            | Infinite Loop in DWG | Stas Shaposhnikov | ?-1.4?|
-*-------------*-------------*----------------*------------------*
-|{{{https://issues.apache.org/jira/browse/TIKA-1132}TIKA-1132}}
-            | Apache POI - Nearly Infinite Loop in XLS| Ryan Krueger | ?-1.4|
-*-------------*-------------*----------------*------------------*
-|{{{https://issues.apache.org/jira/browse/TIKA-1179}TIKA-1179}}
-            | Infinite Loop in corrupt MP3| Marius Dumitru Florea| ?-1.4|
-*-------------*-------------*----------------*------------------*
-|{{{https://issues.apache.org/jira/browse/TIKA-866}TIKA-866}}
-            | OOM reading Tika config file|  Stephan Mühlstrasser | ?-1.1|
-*-------------*-------------*----------------*------------------*
-
-
-    Third party vulnerabilities that may or may not be triggerable via regular
-    use of Apache Tika.
-
-*-------------*-------------*----------------*------------------*
-|CVE or Vulnerability| Description | Reporter       | Affected Versions|
-*-------------*-------------*----------------*------------------*
-| {{{https://nvd.nist.gov/vuln/detail//CVE-2018-10237} CVE-2018-10237}}
-              | Unbounded memory allocation in Google Guava|Pat Cashman 
(notified Tika team)|?-1.20|
-*-------------*-------------*----------------*------------------*
-|{{{https://nvd.nist.gov/vuln/detail/CVE-2018-19362}CVE-2018-19362}}
-            |FaxterXML jackson-databind may allow attackers to have 
unspecified impact from polymorphic deserialization |Pat Cashman (notified Tika 
team)| ?-1.20|
-*-------------*-------------*----------------*------------------*
-
-Acronyms and Terms
-
-    * Command Execution -- A malicious client could execute anything on 
tika-server's commandline
+    Parsing is dangerous. Bad things can happen when parsing untrusted data. 
Apache Tika is primarily designed to
+    work with trusted/sanitized data. Users are responsible for handling 
crashes and other consequences from
+    parsing untrusted data. See 
{{{https://cwiki.apache.org/confluence/display/TIKA/The+Robustness+of+Apache+Tika}
 the Robustness of Apache Tika}}
+    for guidance on how to run Tika more safely.
 
-    * Deserialization Vulnerability -- 
{{{https://www.owasp.org/index.php/Deserialization_Cheat_Sheet}OWASP's
-      Cheat Sheet}}. A malicious actor could run arbitrary code on your 
computer.
+    Further, mime detection and content extraction are both inherently 
challenging and prone to errors.
+    We advise against trusting without verification either mime detection or 
content extraction in high risk
+    applications such as, for example, cross-domain filtering or search.
 
-    * OOM -- Out of Memory Error -- Parsers may allocate more memory than is 
available.  This can sometimes be caused
-      by parsers not performing sanity checks before allocation.  See, for 
example: {{{https://issues.apache.org/jira/browse/TIKA-1631}TIKA-1631}}
+    Tika is not designed to identify or render safe files that are crafted to 
create parser differentials
+    (such as with polyglots, chimeras, schizophrenic files or ...).
 
-    * XXE -- 
{{{https://www.owasp.org/index.php/XML_External_Entity_(XXE)_Processing}
-      XML External Entity Processing}} A malicious client could access data on 
your system.
+    Files can be crafted to evade detection, hinder analysis or otherwise 
cause mayhem in countless ways.
 
+    Running {{{https://cwiki.apache.org/confluence/display/TIKA/TikaServer}} 
tika-server} adds its own security risks.
+    Depending on the settings and what modules are loaded 
({{{https://cwiki.apache.org/confluence/display/TIKA/tika-pipes} tika-pipes}}, 
for example),
+    it is possible to grant read and write access at the same level as the 
user running the application.
+    We strongly encourage defense in depth with tika-server, including 
isolating access to its endpoints, setting up two-way TLS,
+    and limiting its user permissions.
 
+    The project makes every effort to prevent Denial of Service attacks and 
other software vulnerabilities,
+    and we welcome reports and example proof-of-concept files. Some Denial of 
Service attacks are not easily fixed,
+    and users need to take precautions when parsing untrusted data.
+
+    We welcome suggestions and pull requests for hardening the code base.
+
+    See our {{{/security.html} Security page}} for fixed vulnerabilities.
 

Modified: tika/site/src/site/apt/security.apt
URL: 
http://svn.apache.org/viewvc/tika/site/src/site/apt/security.apt?rev=1926757&r1=1926756&r2=1926757&view=diff
==============================================================================
--- tika/site/src/site/apt/security.apt (original)
+++ tika/site/src/site/apt/security.apt Fri Jun 27 16:49:13 2025
@@ -24,6 +24,7 @@ Security
    vulnerabilities in Apache Tika or its dependencies.  Please
    help us fill this in with more details.
 
+   See also see our {{{/security-model.html} security model}}.
 
 *-------------*-------------*----------------*------------------*
 |CVE or Vulnerability| Description | Reporter       | Affected Versions|


Reply via email to