Tino Schöllhorn created TIKA-4422: ------------------------------------- Summary: Availability problem with TikaServer 3.1.0 List-ID:<user.tika.apache.org> Key: TIKA-4422 URL: https://issues.apache.org/jira/browse/TIKA-4422 Project: Tika Issue Type: Bug Components: tika-server Affects Versions: 3.1.0 Environment: Java21
Ubuntu22 Reporter: Tino Schöllhorn Hi, we have a problem when running the TikaServer. We use Tika 3.1.0 on Ubuntu with Java21. Previously, we used Tika 2.4.x - there we could not observe this problem. We run a *lot* of text-extraction requests. After a few hours (8-10h) Tika is not able to restart its worker processes. Tika runs via systemd and via journalctl we see the following output: -- journalct.start May 28 04:39:39 dss-index java[350084]: INFO [pool-2-thread-1] 04:39:39,752 org.apache.tika.server.core.TikaServerWatchDog forked process exited with exit value 3 May 28 04:39:40 dss-index java[376963]: May 28, 2025 4:39:40 AM org.apache.cxf.endpoint.ServerImpl initDestination May 28 04:39:40 dss-index java[376963]: INFO: Setting the server's publish address to be http://localhost:9998/ May 28 05:35:32 dss-index java[350084]: INFO [pool-2-thread-1] 05:35:32,896 org.apache.tika.server.core.TikaServerWatchDog forked process exited with exit value 2 May 28 05:35:34 dss-index java[377213]: May 28, 2025 5:35:34 AM org.apache.cxf.endpoint.ServerImpl initDestination May 28 05:35:34 dss-index java[377213]: INFO: Setting the server's publish address to be http://localhost:9998/ -- journalct.end After these messages the TikaServer does not respond to requests any more. A restart of the Tika-Parent process is the only thing which helps. The error messages are emitted in TikaServerWatchDog:161. Yet, I do not understand what is going wrong here. Probably the messages are error messages from the OS. perror gives the following output: OS error code 2: No such file or directory OS error code 3: No such process Yet, it is unclear to me, what happens. Below you'll find the tika.config. As far as I understand the situation this seems a bug which has been introduced sometime between version 2.4.x and 3.1.0. Hope that someone has an idea what is going on and how this can be remedied. Tino -- tika.config.start <?xml version="1.0" encoding="UTF-8"?> <properties> <parsers> <parser class="org.apache.tika.parser.DefaultParser"> </parser> </parsers> <server> <params> <port>9998</port> <host>localhost</host> <digest>sha256</digest> <digestMarkLimit>1000000</digestMarkLimit> <id></id> <cors>NONE</cors> <logLevel>info</logLevel> <returnStackTrace>false</returnStackTrace> <noFork>false</noFork> <taskTimeoutMillis>300000</taskTimeoutMillis> <maxForkedStartupMillis>120000</maxForkedStartupMillis> <maxRestarts>-1</maxRestarts> <maxFiles>25000</maxFiles> <javaPath>java</javaPath> <forkedJvmArgs> <arg>-Xms4g</arg> <arg>-Xmx4g</arg> <arg>-Dlog4j.configurationFile=tika-forked-log4j2.xml</arg> </forkedJvmArgs> <enableUnsecureFeatures>false</enableUnsecureFeatures> <endpoints> <endpoint>status</endpoint> <endpoint>tika</endpoint> <endpoint>rmeta</endpoint> <endpoint>language</endpoint> </endpoints> </params> </server> </properties> -- tika.config.stop -- This message was sent by Atlassian Jira (v8.20.10#820010)