Logging to STDERR fixed the docker. On Fri, Feb 20, 2026 at 2:21 PM Mikhail Khludnev <[email protected]> wrote:
> Hold on. It might be the difference in log4j2.xml. I've changed logs from > stderr to stdout. That might be a problem. Checking. > > On Fri, Feb 20, 2026 at 11:06 AM Mikhail Khludnev <[email protected]> wrote: > >> Thanks for the clue, Tim. BUT. >> >> same 3.x-SNAPSHOT .jars and configs works smoothly in bare Linux host >> (excluding Doker from the equation): >> >> DEBUG [pool-3-thread-4] 17:01:27,928 org.apache.tika.pipes.PipesClient >> pipesClientId=3: commandline: [java, -cp, >> Downloads/tika-emitter-s3-3.3.0-SNAPSHOT.jar:Downloads/tika-pipes-iterator-s3-3.3.0-SNAPSHOT.jar:Downloads/tika-app-3.3.0-jdk21-SNAPSHOT.jar:Downloads/tika-fetcher-s3-3.3.0-SNAPSHOT.jar, >> -Djava.awt.headless=true, -DpipesClientId=3, >> -Dlog4j.configurationFile=log4j2.xml, org.apache.tika.pipes.PipesServer, >> /home/mikhail-khludnev/git/norn-budget-control-demo/v-conversion-tikamd-job/docker/tika-config.xml, >> 100000, 300000, 1500000] >> DEBUG [pool-3-thread-1] 17:01:27,979 >> org.apache.tika.pipes.async.AsyncProcessor fetchEmitWorker finished, total 1 >> DEBUG [pool-3-thread-6] 17:01:27,989 >> org.apache.tika.pipes.async.AsyncEmitter cache size: (0) bytes and extract >> count: 0 >> 2026-02-19T14:01:28.290084314Z main DEBUG Apache Tika application 3.3 >> initializing configuration >> XmlConfiguration[location=/home/mikhail-khludnev/log4j2.xml, >> lastModified=2026-02-17T23:05:22.608Z] >> ... >> DEBUG [main] 17:01:29,263 org.apache.tika.pipes.PipesServer pipes server >> initialized >> DEBUG [main] 17:01:29,303 org.apache.tika.pipes.fetcher.s3.S3Fetcher >> about to fetch fetchkey=path/to/4Mb.pdf from bucket (test-bucket) >> >> However, after I directed the child server process logs to logfile docker >> passed! Thanks Tim. Wondering how it works and how the container >> environment impacts console redirection. >> >> Looking forward for 3.3 release with Markdown! Overall, may you share >> your vision regarding using Tika in shortlied containers, whether it makes >> sense at all? What to choose Tika app CLI in batch mode or TikaPipes? >> Thank you twice! >> >> >> On Fri, Feb 20, 2026 at 3:52 AM Tim Allison <[email protected]> wrote: >> >>> Ugh. Thank you for reporting this. >>> >>> The problem may be that the logger from the forked process is writing to >>> stdout or stderr (can’t remember off top of my head) which is the comms >>> channel in 3.x to the forking process. We’ve fixed this in 4.x. >>> >>> If you modify forked process logging to write to file or the other, you >>> should be ok. >>> >>> Please let us know how it goes. >>> >>> >>> On Thu, Feb 19, 2026 at 3:47 PM Mikhail Khludnev <[email protected]> >>> wrote: >>> >>> > FWIW, just to let you know about the deadend. >>> > >>> > I'm a big fan of Serverless containers see TIKA-4529, but I decided to >>> go >>> > further and use S3 fetcher and s3 Emitter that turn me to TikaAsyncCLI. >>> > I've put it into Docker with tesseract, etc. >>> > Finally, it pulls 4Mb pdf from s3, spins of TikaServer jvm it lanches >>> those >>> > binary tools to check their availability and just dies: >>> > >>> > org.apache.tika.pipes.PipesClient pipesClientId=0: commandline: [java, >>> > -cp, >>> > >>> > >>> /tika-emitter-s3.jar:/tika-fetcher-s3.jar:/tika-pipes-iterator-s3.jar:/tika-app.jar, >>> > -Djava.awt.headless=true, -DpipesClientId=0, >>> > -Dlog4j.configurationFile=file:///log4j2.xml, -XX:+UseContainerSupport, >>> > -XX:MaxRAMPercentage=15, -XX:InitialRAMPercentage=15, >>> > org.apache.tika.pipes.PipesServer, /tmp/tika-config.xml, 100000, >>> 300000, >>> > 1500000] >>> > >>> > .PipesClient pipesClientId=0: From forked process before start byte: >>> DEBUG >>> > [main] 16:25:14,240 org.apache.tika.pipes.PipesServer processing >>> requests >>> > org.apache.tika.parser.ocr.TesseractOCRParser hasTesseract (path: >>> > [tesseract]): true >>> > s.PipesServer timer -- initialize parser and other resources: 939 ms >>> > DEBUG [main] 16:25:15,180 org.apache.tika.pipes.PipesServer pipes >>> server >>> > initialized >>> > >>> > TRACE [pool-4-thread-1] 16:25:15,206 org.apache.tika.pipes.PipesClient >>> > pipesClientId=0: timer -- write tuple: 24 ms >>> > ERROR [pool-3-thread-2] 16:25:15,239 org.apache.tika.pipes.PipesClient >>> > pipesClientId=0: execution exception >>> > java.util.concurrent.ExecutionException: java.io.IOException: problem >>> > reading response from server: 54 >>> > >>> > Caused by: java.lang.IllegalArgumentException: byte with index 83 must >>> be < >>> > 17 >>> > at >>> > org.apache.tika.pipes.PipesServer$STATUS.lookup(PipesServer.java:123) >>> > at >>> > org.apache.tika.pipes.PipesClient.readResults(PipesClient.java:291) >>> > ... 5 more >>> > TRACE [pool-3-thread-6] 16:25:15,332 >>> > org.apache.tika.pipes.async.AsyncEmitter Nothing on the async queue >>> > DEBUG [pool-3-thread-6] 16:25:15,332 >>> > org.apache.tika.pipes.async.AsyncEmitter cache size: (0) bytes and >>> extract >>> > count: 0 >>> > WARN [pool-3-thread-2] 16:25:15,458 org.apache.tika.pipes.PipesClient >>> > pipesClientId=0 crash: path/to/4mb.pdf in 59 ms with exit code 137 >>> > TRACE [pool-3-thread-2] 16:25:15,458 >>> > org.apache.tika.pipes.async.AsyncProcessor timer -- pipes client >>> process: >>> > 1646 ms >>> > >>> > the only clue I have is [..with exit code 137], it implies OOM, but I >>> can't >>> > see any other evidence, counters or logs or whatever. >>> > >>> > We can count it as a bug that failed Server isn;t propagated to the >>> failure >>> > of TikaAsyncCLI >>> > >>> > DEBUG [pool-3-thread-6] 16:25:15,813 >>> > org.apache.tika.pipes.async.AsyncEmitter emitted: 0 files >>> > DEBUG [pool-3-thread-1] 16:25:15,820 >>> > org.apache.tika.pipes.async.AsyncProcessor emitter thread finished, >>> total 1 >>> > INFO [main] 16:25:16,313 org.apache.tika.async.cli.TikaAsyncCLI >>> > Successfully finished processing 1 files in 3001 ms >>> > >>> > I've tweaked settings a little, memory size etc, it's helpless. Same >>> > configuration works fine on host linux w/o container. >>> > >>> > So, I gave up, turn back to tika-app cli. FYI. >>> > -- >>> > Sincerely yours >>> > Mikhail Khludnev >>> > >>> >> >> >> -- >> Sincerely yours >> Mikhail Khludnev >> > > > -- > Sincerely yours > Mikhail Khludnev > -- Sincerely yours Mikhail Khludnev
