Hold on. It might be the difference in log4j2.xml. I've changed logs from stderr to stdout. That might be a problem. Checking.
On Fri, Feb 20, 2026 at 11:06 AM Mikhail Khludnev <[email protected]> wrote: > Thanks for the clue, Tim. BUT. > > same 3.x-SNAPSHOT .jars and configs works smoothly in bare Linux host > (excluding Doker from the equation): > > DEBUG [pool-3-thread-4] 17:01:27,928 org.apache.tika.pipes.PipesClient > pipesClientId=3: commandline: [java, -cp, > Downloads/tika-emitter-s3-3.3.0-SNAPSHOT.jar:Downloads/tika-pipes-iterator-s3-3.3.0-SNAPSHOT.jar:Downloads/tika-app-3.3.0-jdk21-SNAPSHOT.jar:Downloads/tika-fetcher-s3-3.3.0-SNAPSHOT.jar, > -Djava.awt.headless=true, -DpipesClientId=3, > -Dlog4j.configurationFile=log4j2.xml, org.apache.tika.pipes.PipesServer, > /home/mikhail-khludnev/git/norn-budget-control-demo/v-conversion-tikamd-job/docker/tika-config.xml, > 100000, 300000, 1500000] > DEBUG [pool-3-thread-1] 17:01:27,979 > org.apache.tika.pipes.async.AsyncProcessor fetchEmitWorker finished, total 1 > DEBUG [pool-3-thread-6] 17:01:27,989 > org.apache.tika.pipes.async.AsyncEmitter cache size: (0) bytes and extract > count: 0 > 2026-02-19T14:01:28.290084314Z main DEBUG Apache Tika application 3.3 > initializing configuration > XmlConfiguration[location=/home/mikhail-khludnev/log4j2.xml, > lastModified=2026-02-17T23:05:22.608Z] > ... > DEBUG [main] 17:01:29,263 org.apache.tika.pipes.PipesServer pipes server > initialized > DEBUG [main] 17:01:29,303 org.apache.tika.pipes.fetcher.s3.S3Fetcher about > to fetch fetchkey=path/to/4Mb.pdf from bucket (test-bucket) > > However, after I directed the child server process logs to logfile docker > passed! Thanks Tim. Wondering how it works and how the container > environment impacts console redirection. > > Looking forward for 3.3 release with Markdown! Overall, may you share your > vision regarding using Tika in shortlied containers, whether it makes sense > at all? What to choose Tika app CLI in batch mode or TikaPipes? > Thank you twice! > > > On Fri, Feb 20, 2026 at 3:52 AM Tim Allison <[email protected]> wrote: > >> Ugh. Thank you for reporting this. >> >> The problem may be that the logger from the forked process is writing to >> stdout or stderr (can’t remember off top of my head) which is the comms >> channel in 3.x to the forking process. We’ve fixed this in 4.x. >> >> If you modify forked process logging to write to file or the other, you >> should be ok. >> >> Please let us know how it goes. >> >> >> On Thu, Feb 19, 2026 at 3:47 PM Mikhail Khludnev <[email protected]> wrote: >> >> > FWIW, just to let you know about the deadend. >> > >> > I'm a big fan of Serverless containers see TIKA-4529, but I decided to >> go >> > further and use S3 fetcher and s3 Emitter that turn me to TikaAsyncCLI. >> > I've put it into Docker with tesseract, etc. >> > Finally, it pulls 4Mb pdf from s3, spins of TikaServer jvm it lanches >> those >> > binary tools to check their availability and just dies: >> > >> > org.apache.tika.pipes.PipesClient pipesClientId=0: commandline: [java, >> > -cp, >> > >> > >> /tika-emitter-s3.jar:/tika-fetcher-s3.jar:/tika-pipes-iterator-s3.jar:/tika-app.jar, >> > -Djava.awt.headless=true, -DpipesClientId=0, >> > -Dlog4j.configurationFile=file:///log4j2.xml, -XX:+UseContainerSupport, >> > -XX:MaxRAMPercentage=15, -XX:InitialRAMPercentage=15, >> > org.apache.tika.pipes.PipesServer, /tmp/tika-config.xml, 100000, 300000, >> > 1500000] >> > >> > .PipesClient pipesClientId=0: From forked process before start byte: >> DEBUG >> > [main] 16:25:14,240 org.apache.tika.pipes.PipesServer processing >> requests >> > org.apache.tika.parser.ocr.TesseractOCRParser hasTesseract (path: >> > [tesseract]): true >> > s.PipesServer timer -- initialize parser and other resources: 939 ms >> > DEBUG [main] 16:25:15,180 org.apache.tika.pipes.PipesServer pipes server >> > initialized >> > >> > TRACE [pool-4-thread-1] 16:25:15,206 org.apache.tika.pipes.PipesClient >> > pipesClientId=0: timer -- write tuple: 24 ms >> > ERROR [pool-3-thread-2] 16:25:15,239 org.apache.tika.pipes.PipesClient >> > pipesClientId=0: execution exception >> > java.util.concurrent.ExecutionException: java.io.IOException: problem >> > reading response from server: 54 >> > >> > Caused by: java.lang.IllegalArgumentException: byte with index 83 must >> be < >> > 17 >> > at >> > org.apache.tika.pipes.PipesServer$STATUS.lookup(PipesServer.java:123) >> > at >> > org.apache.tika.pipes.PipesClient.readResults(PipesClient.java:291) >> > ... 5 more >> > TRACE [pool-3-thread-6] 16:25:15,332 >> > org.apache.tika.pipes.async.AsyncEmitter Nothing on the async queue >> > DEBUG [pool-3-thread-6] 16:25:15,332 >> > org.apache.tika.pipes.async.AsyncEmitter cache size: (0) bytes and >> extract >> > count: 0 >> > WARN [pool-3-thread-2] 16:25:15,458 org.apache.tika.pipes.PipesClient >> > pipesClientId=0 crash: path/to/4mb.pdf in 59 ms with exit code 137 >> > TRACE [pool-3-thread-2] 16:25:15,458 >> > org.apache.tika.pipes.async.AsyncProcessor timer -- pipes client >> process: >> > 1646 ms >> > >> > the only clue I have is [..with exit code 137], it implies OOM, but I >> can't >> > see any other evidence, counters or logs or whatever. >> > >> > We can count it as a bug that failed Server isn;t propagated to the >> failure >> > of TikaAsyncCLI >> > >> > DEBUG [pool-3-thread-6] 16:25:15,813 >> > org.apache.tika.pipes.async.AsyncEmitter emitted: 0 files >> > DEBUG [pool-3-thread-1] 16:25:15,820 >> > org.apache.tika.pipes.async.AsyncProcessor emitter thread finished, >> total 1 >> > INFO [main] 16:25:16,313 org.apache.tika.async.cli.TikaAsyncCLI >> > Successfully finished processing 1 files in 3001 ms >> > >> > I've tweaked settings a little, memory size etc, it's helpless. Same >> > configuration works fine on host linux w/o container. >> > >> > So, I gave up, turn back to tika-app cli. FYI. >> > -- >> > Sincerely yours >> > Mikhail Khludnev >> > >> > > > -- > Sincerely yours > Mikhail Khludnev > -- Sincerely yours Mikhail Khludnev
