Ugh. Thank you for reporting this.

The problem may be that the logger from the forked process is writing to
stdout or stderr (can’t remember off top of my head) which is the comms
channel in 3.x to the forking process. We’ve fixed this in 4.x.

If you modify forked process logging to write to file or the other, you
should be ok.

Please let us know how it goes.


On Thu, Feb 19, 2026 at 3:47 PM Mikhail Khludnev <[email protected]> wrote:

> FWIW, just to let you know about the deadend.
>
> I'm a big fan of Serverless containers see TIKA-4529, but I decided to go
> further and use S3 fetcher and s3 Emitter that turn me to TikaAsyncCLI.
> I've put it into Docker with tesseract, etc.
> Finally, it pulls 4Mb pdf from s3, spins of TikaServer jvm it lanches those
> binary tools to check their availability and just dies:
>
>  org.apache.tika.pipes.PipesClient pipesClientId=0: commandline: [java,
> -cp,
>
> /tika-emitter-s3.jar:/tika-fetcher-s3.jar:/tika-pipes-iterator-s3.jar:/tika-app.jar,
> -Djava.awt.headless=true, -DpipesClientId=0,
> -Dlog4j.configurationFile=file:///log4j2.xml, -XX:+UseContainerSupport,
> -XX:MaxRAMPercentage=15, -XX:InitialRAMPercentage=15,
> org.apache.tika.pipes.PipesServer, /tmp/tika-config.xml, 100000, 300000,
> 1500000]
>
> .PipesClient pipesClientId=0: From forked process before start byte: DEBUG
> [main] 16:25:14,240 org.apache.tika.pipes.PipesServer processing requests
>  org.apache.tika.parser.ocr.TesseractOCRParser hasTesseract (path:
> [tesseract]): true
> s.PipesServer timer -- initialize parser and other resources: 939 ms
> DEBUG [main] 16:25:15,180 org.apache.tika.pipes.PipesServer pipes server
> initialized
>
> TRACE [pool-4-thread-1] 16:25:15,206 org.apache.tika.pipes.PipesClient
> pipesClientId=0: timer -- write tuple: 24 ms
> ERROR [pool-3-thread-2] 16:25:15,239 org.apache.tika.pipes.PipesClient
> pipesClientId=0: execution exception
> java.util.concurrent.ExecutionException: java.io.IOException: problem
> reading response from server: 54
>
> Caused by: java.lang.IllegalArgumentException: byte with index 83 must be <
> 17
>         at
> org.apache.tika.pipes.PipesServer$STATUS.lookup(PipesServer.java:123)
>         at
> org.apache.tika.pipes.PipesClient.readResults(PipesClient.java:291)
>         ... 5 more
> TRACE [pool-3-thread-6] 16:25:15,332
> org.apache.tika.pipes.async.AsyncEmitter Nothing on the async queue
> DEBUG [pool-3-thread-6] 16:25:15,332
> org.apache.tika.pipes.async.AsyncEmitter cache size: (0) bytes and extract
> count: 0
> WARN  [pool-3-thread-2] 16:25:15,458 org.apache.tika.pipes.PipesClient
> pipesClientId=0 crash: path/to/4mb.pdf in 59 ms with exit code 137
> TRACE [pool-3-thread-2] 16:25:15,458
> org.apache.tika.pipes.async.AsyncProcessor timer -- pipes client process:
> 1646 ms
>
> the only clue I have is [..with exit code 137], it implies OOM, but I can't
> see any other evidence, counters or logs or whatever.
>
> We can count it as a bug that failed Server isn;t propagated to the failure
> of TikaAsyncCLI
>
> DEBUG [pool-3-thread-6] 16:25:15,813
> org.apache.tika.pipes.async.AsyncEmitter emitted: 0 files
> DEBUG [pool-3-thread-1] 16:25:15,820
> org.apache.tika.pipes.async.AsyncProcessor emitter thread finished, total 1
> INFO  [main] 16:25:16,313 org.apache.tika.async.cli.TikaAsyncCLI
> Successfully finished processing 1 files in 3001 ms
>
> I've tweaked settings a little, memory size etc, it's helpless. Same
> configuration works fine on host linux w/o container.
>
> So, I gave up, turn back to tika-app cli. FYI.
> --
> Sincerely yours
> Mikhail Khludnev
>

Reply via email to