[ 
https://issues.apache.org/jira/browse/TIKA-4742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18084159#comment-18084159
 ] 

ASF GitHub Bot commented on TIKA-4742:
--------------------------------------

Copilot commented on code in PR #2844:
URL: https://github.com/apache/tika/pull/2844#discussion_r3319658781


##########
docs/modules/ROOT/pages/pipes/troubleshooting.adoc:
##########
@@ -112,6 +143,55 @@ When the watcher fires, the child exits via `System.exit`, 
which runs
 `AbstractExternalProcessParser`'s shutdown hook and cleans up any
 in-flight external subprocesses.
 
+== Log levels and sensitive data
+
+Tika Pipes treats `FetchKey` and `EmitKey` values as potentially sensitive --
+they typically contain file paths, URLs, object-store keys, or other 
identifiers
+that may be private to the data owner. The convention across pipes core and the
+bundled plugins is:
+
+[cols="1,3"]
+|===
+|Level |What is logged
+
+|`ERROR` / `WARN`
+|Failures, exceptions, and configuration problems. *Never* the literal
+ `fetchKey`/`emitKey` or any file content. When a failure refers to a
+ specific document, it is identified by the non-sensitive `FetchEmitTuple.id`
+ (e.g. `parse exception: id=abc-123`).
+
+|`INFO`
+|Lifecycle events 

> Review logging levels and configuration for 4.0.0-beta-1
> --------------------------------------------------------
>
>                 Key: TIKA-4742
>                 URL: https://issues.apache.org/jira/browse/TIKA-4742
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>            Priority: Minor
>
> There are a number of places where we've hit a stable state and we should 
> downgrade from info to debug.
> We also are still including fetchkeys and emitkeys in logging in fetchers and 
> emitters, which is not great from a security standpoint. We should demote 
> those to {{{}trace(){}}}. We might consider adding mdc to inject 
> fetchEmitTuple ids in fetchers+emitters.
> Then there are places where we have info level for actual problems. 
> We should do a review of logging levels before 4.0.0-beta-1.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to