[ https://issues.apache.org/jira/browse/TIKA-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tino Schöllhorn updated TIKA-4422: ---------------------------------- Summary: Availability problem with TikaServer 3.1.0 (was: Availability problem with TikaServer 3.1.0 List-ID:<user.tika.apache.org>) > Availability problem with TikaServer 3.1.0 > ------------------------------------------ > > Key: TIKA-4422 > URL: https://issues.apache.org/jira/browse/TIKA-4422 > Project: Tika > Issue Type: Bug > Components: tika-server > Affects Versions: 3.1.0 > Environment: Java21 > Ubuntu22 > > Reporter: Tino Schöllhorn > Priority: Major > > Hi, > we have a problem when running the TikaServer. We use Tika 3.1.0 on Ubuntu > with Java21. > Previously, we used Tika 2.4.x - there we could not observe this problem. > We run a *lot* of text-extraction requests. After a few hours (8-10h) Tika is > not able to restart its worker processes. > Tika runs via systemd and via journalctl we see the following output: > > {noformat} > May 28 04:39:39 dss-index java[350084]: INFO [pool-2-thread-1] 04:39:39,752 > org.apache.tika.server.core.TikaServerWatchDog forked process exited with > exit value 3 > May 28 04:39:40 dss-index java[376963]: May 28, 2025 4:39:40 AM > org.apache.cxf.endpoint.ServerImpl initDestination > May 28 04:39:40 dss-index java[376963]: INFO: Setting the server's publish > address to be http://localhost:9998/ > May 28 05:35:32 dss-index java[350084]: INFO [pool-2-thread-1] 05:35:32,896 > org.apache.tika.server.core.TikaServerWatchDog forked process exited with > exit value 2 > May 28 05:35:34 dss-index java[377213]: May 28, 2025 5:35:34 AM > org.apache.cxf.endpoint.ServerImpl initDestination > May 28 05:35:34 dss-index java[377213]: INFO: Setting the server's publish > address to be http://localhost:9998/{noformat} > After these messages the TikaServer does not respond to requests any more. A > restart of the Tika-Parent process is the only thing which helps. > The error messages are emitted in TikaServerWatchDog:161. Yet, I do not > understand what is going wrong here. Probably the messages are error > messages from the OS. perror gives the following output: > {noformat} > OS error code 2: No such file or directory > OS error code 3: No such process{noformat} > Yet, it is unclear to me, what happens. Below you'll find the tika.config. > As far as I understand the situation this seems a bug which has been > introduced sometime between version 2.4.x and 3.1.0. > Hope that someone has an idea what is going on and how this can be remedied. > Tino > – tika.config.start > {code:java} > <?xml version="1.0" encoding="UTF-8"?> > <properties> > <parsers> > <parser class="org.apache.tika.parser.DefaultParser"> > </parser> > </parsers> > <server> > <params> > <port>9998</port> > <host>localhost</host> > <digest>sha256</digest> > <digestMarkLimit>1000000</digestMarkLimit> > <id></id> > <cors>NONE</cors> > <logLevel>info</logLevel> > <returnStackTrace>false</returnStackTrace> > <noFork>false</noFork> > <taskTimeoutMillis>300000</taskTimeoutMillis> > <maxForkedStartupMillis>120000</maxForkedStartupMillis> > <maxRestarts>-1</maxRestarts> > <maxFiles>25000</maxFiles> > <javaPath>java</javaPath> > <forkedJvmArgs> > <arg>-Xms4g</arg> > <arg>-Xmx4g</arg> > <arg>-Dlog4j.configurationFile=tika-forked-log4j2.xml</arg> > </forkedJvmArgs> > <enableUnsecureFeatures>false</enableUnsecureFeatures> > <endpoints> > <endpoint>status</endpoint> > <endpoint>tika</endpoint> > <endpoint>rmeta</endpoint> > <endpoint>language</endpoint> > </endpoints> > </params> > </server> > </properties> > {code} > – tika.config.stop > -- This message was sent by Atlassian Jira (v8.20.10#820010)