Copilot commented on code in PR #2844:
URL: https://github.com/apache/tika/pull/2844#discussion_r3315389841


##########
docs/modules/ROOT/pages/pipes/troubleshooting.adoc:
##########
@@ -112,6 +143,55 @@ When the watcher fires, the child exits via `System.exit`, 
which runs
 `AbstractExternalProcessParser`'s shutdown hook and cleans up any
 in-flight external subprocesses.
 
+== Log levels and sensitive data
+
+Tika Pipes treats `FetchKey` and `EmitKey` values as potentially sensitive --
+they typically contain file paths, URLs, object-store keys, or other 
identifiers
+that may be private to the data owner. The convention across pipes core and the
+bundled plugins is:
+
+[cols="1,3"]
+|===
+|Level |What is logged
+
+|`ERROR` / `WARN`
+|Failures, exceptions, and configuration problems. *Never* the literal
+ `fetchKey`/`emitKey` or any file content. When a failure refers to a
+ specific document, it is identified by the non-sensitive `FetchEmitTuple.id`

Review Comment:
   This new convention says WARN logs never include literal 
`fetchKey`/`emitKey`, but the bundled JDBC emitter still logs `emitKey` at WARN 
when truncating values (`JDBCEmitter.StringNormalizer`, lines 447-448). Please 
either update the remaining WARN log to follow the convention or qualify the 
documentation so it matches the current plugins.



##########
docs/modules/ROOT/pages/pipes/troubleshooting.adoc:
##########
@@ -62,6 +62,37 @@ pick them up automatically. The default 
`pipes-fork-server-default-log4j2.xml`
 writes to `SYSTEM_ERR`, so inheritance is what makes those records visible
 to your observability stack.
 
+=== Telling fork lines from parent lines
+
+Since the fork and parent share a single stdio stream, the bundled
+`pipes-fork-server-default-log4j2.xml` pattern adds two orthogonal markers
+so you can read the interleaved output:
+
+* `[fork]` -- present only on lines emitted by a forked `PipesServer`
+  JVM. Lines from the parent process (`PipesClient`, `AsyncProcessor`,
+  `ConnectionHandler`, `tika-server`, `tika-grpc`, etc.) do not carry
+  this tag. Different mechanism on each side: the fork has it injected
+  via the bundled pattern's literal `[fork]` token; the parent does
+  not include it in its own log4j2/logback patterns.
+
+* `pipesClientId=N` -- *the same value on both sides of a pair*. The
+  parent's `PipesClient #N` always connects to the fork running with
+  `-DpipesClientId=N`, so the same N threads correlation across the

Review Comment:
   The phrase "the same N threads correlation" is grammatically unclear; it 
looks like it should say that the same N enables correlation across the process 
boundary.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to