Hold on. It might be the difference in log4j2.xml. I've changed logs from
stderr to stdout. That might be a problem. Checking.

On Fri, Feb 20, 2026 at 11:06 AM Mikhail Khludnev <[email protected]> wrote:

> Thanks for the clue, Tim. BUT.
>
> same 3.x-SNAPSHOT .jars and configs works smoothly in bare Linux host
> (excluding Doker from the equation):
>
> DEBUG [pool-3-thread-4] 17:01:27,928 org.apache.tika.pipes.PipesClient
> pipesClientId=3: commandline: [java, -cp,
> Downloads/tika-emitter-s3-3.3.0-SNAPSHOT.jar:Downloads/tika-pipes-iterator-s3-3.3.0-SNAPSHOT.jar:Downloads/tika-app-3.3.0-jdk21-SNAPSHOT.jar:Downloads/tika-fetcher-s3-3.3.0-SNAPSHOT.jar,
> -Djava.awt.headless=true, -DpipesClientId=3,
> -Dlog4j.configurationFile=log4j2.xml, org.apache.tika.pipes.PipesServer,
> /home/mikhail-khludnev/git/norn-budget-control-demo/v-conversion-tikamd-job/docker/tika-config.xml,
> 100000, 300000, 1500000]
> DEBUG [pool-3-thread-1] 17:01:27,979
> org.apache.tika.pipes.async.AsyncProcessor fetchEmitWorker finished, total 1
> DEBUG [pool-3-thread-6] 17:01:27,989
> org.apache.tika.pipes.async.AsyncEmitter cache size: (0) bytes and extract
> count: 0
> 2026-02-19T14:01:28.290084314Z main DEBUG Apache Tika application 3.3
> initializing configuration
> XmlConfiguration[location=/home/mikhail-khludnev/log4j2.xml,
> lastModified=2026-02-17T23:05:22.608Z]
> ...
> DEBUG [main] 17:01:29,263 org.apache.tika.pipes.PipesServer pipes server
> initialized
> DEBUG [main] 17:01:29,303 org.apache.tika.pipes.fetcher.s3.S3Fetcher about
> to fetch fetchkey=path/to/4Mb.pdf from bucket (test-bucket)
>
> However, after I directed the child server process logs to logfile docker
> passed! Thanks Tim. Wondering how it works and how the container
> environment impacts console redirection.
>
> Looking forward for 3.3 release with Markdown! Overall, may you share your
> vision regarding using Tika in shortlied containers, whether it makes sense
> at all? What to choose Tika app CLI in batch mode or TikaPipes?
> Thank you twice!
>
>
> On Fri, Feb 20, 2026 at 3:52 AM Tim Allison <[email protected]> wrote:
>
>> Ugh. Thank you for reporting this.
>>
>> The problem may be that the logger from the forked process is writing to
>> stdout or stderr (can’t remember off top of my head) which is the comms
>> channel in 3.x to the forking process. We’ve fixed this in 4.x.
>>
>> If you modify forked process logging to write to file or the other, you
>> should be ok.
>>
>> Please let us know how it goes.
>>
>>
>> On Thu, Feb 19, 2026 at 3:47 PM Mikhail Khludnev <[email protected]> wrote:
>>
>> > FWIW, just to let you know about the deadend.
>> >
>> > I'm a big fan of Serverless containers see TIKA-4529, but I decided to
>> go
>> > further and use S3 fetcher and s3 Emitter that turn me to TikaAsyncCLI.
>> > I've put it into Docker with tesseract, etc.
>> > Finally, it pulls 4Mb pdf from s3, spins of TikaServer jvm it lanches
>> those
>> > binary tools to check their availability and just dies:
>> >
>> >  org.apache.tika.pipes.PipesClient pipesClientId=0: commandline: [java,
>> > -cp,
>> >
>> >
>> /tika-emitter-s3.jar:/tika-fetcher-s3.jar:/tika-pipes-iterator-s3.jar:/tika-app.jar,
>> > -Djava.awt.headless=true, -DpipesClientId=0,
>> > -Dlog4j.configurationFile=file:///log4j2.xml, -XX:+UseContainerSupport,
>> > -XX:MaxRAMPercentage=15, -XX:InitialRAMPercentage=15,
>> > org.apache.tika.pipes.PipesServer, /tmp/tika-config.xml, 100000, 300000,
>> > 1500000]
>> >
>> > .PipesClient pipesClientId=0: From forked process before start byte:
>> DEBUG
>> > [main] 16:25:14,240 org.apache.tika.pipes.PipesServer processing
>> requests
>> >  org.apache.tika.parser.ocr.TesseractOCRParser hasTesseract (path:
>> > [tesseract]): true
>> > s.PipesServer timer -- initialize parser and other resources: 939 ms
>> > DEBUG [main] 16:25:15,180 org.apache.tika.pipes.PipesServer pipes server
>> > initialized
>> >
>> > TRACE [pool-4-thread-1] 16:25:15,206 org.apache.tika.pipes.PipesClient
>> > pipesClientId=0: timer -- write tuple: 24 ms
>> > ERROR [pool-3-thread-2] 16:25:15,239 org.apache.tika.pipes.PipesClient
>> > pipesClientId=0: execution exception
>> > java.util.concurrent.ExecutionException: java.io.IOException: problem
>> > reading response from server: 54
>> >
>> > Caused by: java.lang.IllegalArgumentException: byte with index 83 must
>> be <
>> > 17
>> >         at
>> > org.apache.tika.pipes.PipesServer$STATUS.lookup(PipesServer.java:123)
>> >         at
>> > org.apache.tika.pipes.PipesClient.readResults(PipesClient.java:291)
>> >         ... 5 more
>> > TRACE [pool-3-thread-6] 16:25:15,332
>> > org.apache.tika.pipes.async.AsyncEmitter Nothing on the async queue
>> > DEBUG [pool-3-thread-6] 16:25:15,332
>> > org.apache.tika.pipes.async.AsyncEmitter cache size: (0) bytes and
>> extract
>> > count: 0
>> > WARN  [pool-3-thread-2] 16:25:15,458 org.apache.tika.pipes.PipesClient
>> > pipesClientId=0 crash: path/to/4mb.pdf in 59 ms with exit code 137
>> > TRACE [pool-3-thread-2] 16:25:15,458
>> > org.apache.tika.pipes.async.AsyncProcessor timer -- pipes client
>> process:
>> > 1646 ms
>> >
>> > the only clue I have is [..with exit code 137], it implies OOM, but I
>> can't
>> > see any other evidence, counters or logs or whatever.
>> >
>> > We can count it as a bug that failed Server isn;t propagated to the
>> failure
>> > of TikaAsyncCLI
>> >
>> > DEBUG [pool-3-thread-6] 16:25:15,813
>> > org.apache.tika.pipes.async.AsyncEmitter emitted: 0 files
>> > DEBUG [pool-3-thread-1] 16:25:15,820
>> > org.apache.tika.pipes.async.AsyncProcessor emitter thread finished,
>> total 1
>> > INFO  [main] 16:25:16,313 org.apache.tika.async.cli.TikaAsyncCLI
>> > Successfully finished processing 1 files in 3001 ms
>> >
>> > I've tweaked settings a little, memory size etc, it's helpless. Same
>> > configuration works fine on host linux w/o container.
>> >
>> > So, I gave up, turn back to tika-app cli. FYI.
>> > --
>> > Sincerely yours
>> > Mikhail Khludnev
>> >
>>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


-- 
Sincerely yours
Mikhail Khludnev

Reply via email to