This is an automated email from the ASF dual-hosted git repository.

tallison pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tika.git


The following commit(s) were added to refs/heads/main by this push:
     new 46b002d97e TIKA-4645 - part 2, general updates for alpha release -- 
update docs
46b002d97e is described below

commit 46b002d97ef33740c6700f15560fe30ec1fb124e
Author: tallison <[email protected]>
AuthorDate: Mon Feb 2 14:22:04 2026 -0500

    TIKA-4645 - part 2, general updates for alpha release -- update docs
---
 docs/modules/ROOT/pages/pipes/unpack-config.adoc   | 71 ++++++++++++++++++++++
 docs/modules/ROOT/pages/roadmap.adoc               |  6 +-
 docs/modules/ROOT/pages/security.adoc              | 47 ++++++++++++++
 .../ROOT/pages/using-tika/server/index.adoc        | 50 +++++++++++++++
 .../tika/parser/external/ExternalParser.java       |  4 ++
 5 files changed, 175 insertions(+), 3 deletions(-)

diff --git a/docs/modules/ROOT/pages/pipes/unpack-config.adoc 
b/docs/modules/ROOT/pages/pipes/unpack-config.adoc
index ce9ddd159d..45f15cdb1c 100644
--- a/docs/modules/ROOT/pages/pipes/unpack-config.adoc
+++ b/docs/modules/ROOT/pages/pipes/unpack-config.adoc
@@ -192,6 +192,77 @@ When the limit is reached:
 
 Set `maxUnpackBytes=-1` to disable the limit. This is not recommended for 
untrusted input.
 
+== Frictionless Data Package Output
+
+The UNPACK mode can output files in https://frictionlessdata.io/[Frictionless 
Data Package] format,
+a standard for packaging data files with their metadata. This format includes 
a `datapackage.json`
+manifest with file checksums and MIME types, making it easy to verify and 
process extracted files.
+
+=== Enabling Frictionless Output
+
+Set `outputFormat` to `FRICTIONLESS` in your UnpackConfig:
+
+[source,json]
+----
+{
+  "parseContext": {
+    "parseMode": "UNPACK",
+    "unpack-config": {
+      "outputFormat": "FRICTIONLESS",
+      "includeFullMetadata": true
+    }
+  }
+}
+----
+
+=== Output Structure
+
+When using Frictionless output format, the ZIP archive contains:
+
+[source]
+----
+output.zip
+├── datapackage.json      # Manifest with file list, SHA256 hashes, mimetypes
+├── metadata.json         # Full RMETA metadata (if includeFullMetadata=true)
+└── unpacked/
+    ├── 00000001.pdf
+    ├── 00000002.png
+    └── ...
+----
+
+The `datapackage.json` file contains:
+
+* List of all extracted files as "resources"
+* SHA256 hash for each file
+* MIME type for each file
+* File size in bytes
+
+=== Frictionless Configuration Options
+
+[cols="2,1,2,3"]
+|===
+|Property |Type |Default |Description
+
+|`outputFormat`
+|STANDARD, FRICTIONLESS
+|`STANDARD`
+|Output format for the ZIP archive. Use `FRICTIONLESS` for Data Package format.
+
+|`includeFullMetadata`
+|boolean
+|`false`
+|Include a `metadata.json` file with full RMETA-style metadata for all 
extracted files.
+|===
+
+=== CLI Usage
+
+Extract files in Frictionless format using the CLI:
+
+[source,bash]
+----
+java -jar tika-app.jar --unpack --unpack-format=FRICTIONLESS -i input.docx -o 
output/
+----
+
 == Code Examples
 
 For working code examples, see:
diff --git a/docs/modules/ROOT/pages/roadmap.adoc 
b/docs/modules/ROOT/pages/roadmap.adoc
index 3e28829a43..f58fa4905e 100644
--- a/docs/modules/ROOT/pages/roadmap.adoc
+++ b/docs/modules/ROOT/pages/roadmap.adoc
@@ -37,10 +37,10 @@ with traditional calendars.
 |April 2025
 |End support for 2.x (and Java 8)
 
-|January 2026
+|March 2026
 |Release 4.0.0
 
-|June 2026
+|March 2027
 |End support for 3.x (and Java 11)
 |===
 
@@ -65,7 +65,7 @@ with traditional calendars.
 |4.x
 |17
 |jakarta
-|January 2026
+|March 2026
 |TBD
 
 |5.x
diff --git a/docs/modules/ROOT/pages/security.adoc 
b/docs/modules/ROOT/pages/security.adoc
index ddc09b7215..0357714508 100644
--- a/docs/modules/ROOT/pages/security.adoc
+++ b/docs/modules/ROOT/pages/security.adoc
@@ -32,3 +32,50 @@ For information about known security vulnerabilities (CVEs) 
in Apache Tika and t
 remediation, please see:
 
 * https://tika.apache.org/security.html[Apache Tika Security Vulnerabilities]
+
+== External Command Security
+
+Apache Tika can be configured to use external system commands for certain 
operations,
+such as the `FileCommandDetector` and `ExternalParser` components.
+
+CAUTION: External command configuration should only be performed by trusted 
administrators.
+Never allow untrusted users to configure command paths or arguments.
+
+=== Security Best Practices
+
+1. **Restrict configuration access**: Only allow administrators to modify Tika 
configuration
+   files that specify external commands.
+
+2. **Use absolute paths**: Always configure external commands with absolute 
paths to prevent
+   PATH manipulation attacks.
+
+3. **Sandbox execution**: Consider running Tika in a container or sandbox 
environment to
+   limit the impact of any command execution vulnerabilities.
+
+4. **Audit command configuration**: Regularly review configured external 
commands and
+   their arguments.
+
+=== Affected Components
+
+* `FileCommandDetector`: Uses the system `file` command for MIME type detection
+* `ExternalParser`: Executes arbitrary external programs to extract content
+* `ExternalEmbedder`: Uses external tools to embed content
+
+== Credential Handling
+
+=== Password Storage in Memory
+
+Tika stores some credentials as Java String objects, which remain in memory 
until
+garbage collected. For environments with strict security requirements:
+
+1. **Use environment variables**: Configure credentials via environment 
variables
+   rather than configuration files where possible.
+
+2. **Use secret managers**: Integrate with HashiCorp Vault, AWS Secrets 
Manager,
+   or similar services for production deployments.
+
+3. **Enable encryption**: Use the AES encryption option in `HttpClientFactory`
+   for stored passwords.
+
+4. **Minimize credential scope**: Use credentials with minimum necessary 
privileges
+   and rotate them regularly.
diff --git a/docs/modules/ROOT/pages/using-tika/server/index.adoc 
b/docs/modules/ROOT/pages/using-tika/server/index.adoc
index 59017b943b..dbb086b5e6 100644
--- a/docs/modules/ROOT/pages/using-tika/server/index.adoc
+++ b/docs/modules/ROOT/pages/using-tika/server/index.adoc
@@ -36,3 +36,53 @@ The server starts on port 9998 by default.
 == Topics
 
 * xref:using-tika/server/tls.adoc[TLS/SSL Configuration] - Secure your server 
with TLS and mutual authentication
+
+== Security Configuration
+
+=== Config Endpoint Protection
+
+By default, the `/config` endpoints that expose server configuration are 
disabled for security
+reasons. These endpoints can reveal sensitive information about your server 
configuration,
+including parser settings and system properties (see 
https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2015-3271[CVE-2015-3271]).
+
+The protected endpoints include:
+
+* `/config` - Returns the server's full configuration
+* `/config/parsers` - Returns configured parsers
+* `/config/detectors` - Returns configured detectors
+* `/config/mimeTypes` - Returns MIME type mappings
+
+=== Enabling Config Endpoints
+
+To enable these endpoints:
+
+[source,json]
+----
+{
+  "server": {
+    "enableUnsecureFeatures": true
+  }
+}
+----
+
+WARNING: Only enable `enableUnsecureFeatures` if you have secured access to 
Tika Server through
+network controls (firewalls, private subnets), a reverse proxy (nginx, Apache 
httpd), or
+xref:using-tika/server/tls.adoc[2-way TLS authentication]. Exposing config 
endpoints to
+untrusted networks can help attackers identify vulnerabilities and craft 
targeted attacks.
+
+=== Command Line Usage
+
+You can also enable unsecure features via command line:
+
+[source,bash]
+----
+java -jar tika-server-standard.jar --enableUnsecureFeatures
+----
+
+=== Security Best Practices
+
+1. **Keep config endpoints disabled** in production (default behavior)
+2. **Use network controls** to restrict access to the Tika Server (firewall 
rules, private subnets)
+3. **Consider TLS** for encrypted communication - see 
xref:using-tika/server/tls.adoc[TLS Configuration]
+4. **Run with minimal privileges** - don't run Tika Server as root
+5. **Monitor logs** for unusual access patterns
diff --git 
a/tika-core/src/main/java/org/apache/tika/parser/external/ExternalParser.java 
b/tika-core/src/main/java/org/apache/tika/parser/external/ExternalParser.java
index 38d4f61563..0e17384928 100644
--- 
a/tika-core/src/main/java/org/apache/tika/parser/external/ExternalParser.java
+++ 
b/tika-core/src/main/java/org/apache/tika/parser/external/ExternalParser.java
@@ -55,7 +55,11 @@ import org.apache.tika.sax.XHTMLContentHandler;
 /**
  * Parser that uses an external program (like catdoc or pdf2txt) to extract
  * text content and metadata from a given document.
+ *
+ * @deprecated Use {@link org.apache.tika.parser.external2.ExternalParser} 
instead.
+ *             This class will be removed in a future version of Tika.
  */
+@Deprecated
 public class ExternalParser implements Parser {
 
     private static final Logger LOG = 
LoggerFactory.getLogger(ExternalParser.class);

Reply via email to