Logging to STDERR fixed the docker.

On Fri, Feb 20, 2026 at 2:21 PM Mikhail Khludnev <[email protected]> wrote:

> Hold on. It might be the difference in log4j2.xml. I've changed logs from
> stderr to stdout. That might be a problem. Checking.
>
> On Fri, Feb 20, 2026 at 11:06 AM Mikhail Khludnev <[email protected]> wrote:
>
>> Thanks for the clue, Tim. BUT.
>>
>> same 3.x-SNAPSHOT .jars and configs works smoothly in bare Linux host
>> (excluding Doker from the equation):
>>
>> DEBUG [pool-3-thread-4] 17:01:27,928 org.apache.tika.pipes.PipesClient
>> pipesClientId=3: commandline: [java, -cp,
>> Downloads/tika-emitter-s3-3.3.0-SNAPSHOT.jar:Downloads/tika-pipes-iterator-s3-3.3.0-SNAPSHOT.jar:Downloads/tika-app-3.3.0-jdk21-SNAPSHOT.jar:Downloads/tika-fetcher-s3-3.3.0-SNAPSHOT.jar,
>> -Djava.awt.headless=true, -DpipesClientId=3,
>> -Dlog4j.configurationFile=log4j2.xml, org.apache.tika.pipes.PipesServer,
>> /home/mikhail-khludnev/git/norn-budget-control-demo/v-conversion-tikamd-job/docker/tika-config.xml,
>> 100000, 300000, 1500000]
>> DEBUG [pool-3-thread-1] 17:01:27,979
>> org.apache.tika.pipes.async.AsyncProcessor fetchEmitWorker finished, total 1
>> DEBUG [pool-3-thread-6] 17:01:27,989
>> org.apache.tika.pipes.async.AsyncEmitter cache size: (0) bytes and extract
>> count: 0
>> 2026-02-19T14:01:28.290084314Z main DEBUG Apache Tika application 3.3
>> initializing configuration
>> XmlConfiguration[location=/home/mikhail-khludnev/log4j2.xml,
>> lastModified=2026-02-17T23:05:22.608Z]
>> ...
>> DEBUG [main] 17:01:29,263 org.apache.tika.pipes.PipesServer pipes server
>> initialized
>> DEBUG [main] 17:01:29,303 org.apache.tika.pipes.fetcher.s3.S3Fetcher
>> about to fetch fetchkey=path/to/4Mb.pdf from bucket (test-bucket)
>>
>> However, after I directed the child server process logs to logfile docker
>> passed! Thanks Tim. Wondering how it works and how the container
>> environment impacts console redirection.
>>
>> Looking forward for 3.3 release with Markdown! Overall, may you share
>> your vision regarding using Tika in shortlied containers, whether it makes
>> sense at all? What to choose Tika app CLI in batch mode or TikaPipes?
>> Thank you twice!
>>
>>
>> On Fri, Feb 20, 2026 at 3:52 AM Tim Allison <[email protected]> wrote:
>>
>>> Ugh. Thank you for reporting this.
>>>
>>> The problem may be that the logger from the forked process is writing to
>>> stdout or stderr (can’t remember off top of my head) which is the comms
>>> channel in 3.x to the forking process. We’ve fixed this in 4.x.
>>>
>>> If you modify forked process logging to write to file or the other, you
>>> should be ok.
>>>
>>> Please let us know how it goes.
>>>
>>>
>>> On Thu, Feb 19, 2026 at 3:47 PM Mikhail Khludnev <[email protected]>
>>> wrote:
>>>
>>> > FWIW, just to let you know about the deadend.
>>> >
>>> > I'm a big fan of Serverless containers see TIKA-4529, but I decided to
>>> go
>>> > further and use S3 fetcher and s3 Emitter that turn me to TikaAsyncCLI.
>>> > I've put it into Docker with tesseract, etc.
>>> > Finally, it pulls 4Mb pdf from s3, spins of TikaServer jvm it lanches
>>> those
>>> > binary tools to check their availability and just dies:
>>> >
>>> >  org.apache.tika.pipes.PipesClient pipesClientId=0: commandline: [java,
>>> > -cp,
>>> >
>>> >
>>> /tika-emitter-s3.jar:/tika-fetcher-s3.jar:/tika-pipes-iterator-s3.jar:/tika-app.jar,
>>> > -Djava.awt.headless=true, -DpipesClientId=0,
>>> > -Dlog4j.configurationFile=file:///log4j2.xml, -XX:+UseContainerSupport,
>>> > -XX:MaxRAMPercentage=15, -XX:InitialRAMPercentage=15,
>>> > org.apache.tika.pipes.PipesServer, /tmp/tika-config.xml, 100000,
>>> 300000,
>>> > 1500000]
>>> >
>>> > .PipesClient pipesClientId=0: From forked process before start byte:
>>> DEBUG
>>> > [main] 16:25:14,240 org.apache.tika.pipes.PipesServer processing
>>> requests
>>> >  org.apache.tika.parser.ocr.TesseractOCRParser hasTesseract (path:
>>> > [tesseract]): true
>>> > s.PipesServer timer -- initialize parser and other resources: 939 ms
>>> > DEBUG [main] 16:25:15,180 org.apache.tika.pipes.PipesServer pipes
>>> server
>>> > initialized
>>> >
>>> > TRACE [pool-4-thread-1] 16:25:15,206 org.apache.tika.pipes.PipesClient
>>> > pipesClientId=0: timer -- write tuple: 24 ms
>>> > ERROR [pool-3-thread-2] 16:25:15,239 org.apache.tika.pipes.PipesClient
>>> > pipesClientId=0: execution exception
>>> > java.util.concurrent.ExecutionException: java.io.IOException: problem
>>> > reading response from server: 54
>>> >
>>> > Caused by: java.lang.IllegalArgumentException: byte with index 83 must
>>> be <
>>> > 17
>>> >         at
>>> > org.apache.tika.pipes.PipesServer$STATUS.lookup(PipesServer.java:123)
>>> >         at
>>> > org.apache.tika.pipes.PipesClient.readResults(PipesClient.java:291)
>>> >         ... 5 more
>>> > TRACE [pool-3-thread-6] 16:25:15,332
>>> > org.apache.tika.pipes.async.AsyncEmitter Nothing on the async queue
>>> > DEBUG [pool-3-thread-6] 16:25:15,332
>>> > org.apache.tika.pipes.async.AsyncEmitter cache size: (0) bytes and
>>> extract
>>> > count: 0
>>> > WARN  [pool-3-thread-2] 16:25:15,458 org.apache.tika.pipes.PipesClient
>>> > pipesClientId=0 crash: path/to/4mb.pdf in 59 ms with exit code 137
>>> > TRACE [pool-3-thread-2] 16:25:15,458
>>> > org.apache.tika.pipes.async.AsyncProcessor timer -- pipes client
>>> process:
>>> > 1646 ms
>>> >
>>> > the only clue I have is [..with exit code 137], it implies OOM, but I
>>> can't
>>> > see any other evidence, counters or logs or whatever.
>>> >
>>> > We can count it as a bug that failed Server isn;t propagated to the
>>> failure
>>> > of TikaAsyncCLI
>>> >
>>> > DEBUG [pool-3-thread-6] 16:25:15,813
>>> > org.apache.tika.pipes.async.AsyncEmitter emitted: 0 files
>>> > DEBUG [pool-3-thread-1] 16:25:15,820
>>> > org.apache.tika.pipes.async.AsyncProcessor emitter thread finished,
>>> total 1
>>> > INFO  [main] 16:25:16,313 org.apache.tika.async.cli.TikaAsyncCLI
>>> > Successfully finished processing 1 files in 3001 ms
>>> >
>>> > I've tweaked settings a little, memory size etc, it's helpless. Same
>>> > configuration works fine on host linux w/o container.
>>> >
>>> > So, I gave up, turn back to tika-app cli. FYI.
>>> > --
>>> > Sincerely yours
>>> > Mikhail Khludnev
>>> >
>>>
>>
>>
>> --
>> Sincerely yours
>> Mikhail Khludnev
>>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


-- 
Sincerely yours
Mikhail Khludnev

Reply via email to