Tino Schöllhorn created TIKA-4422:
-------------------------------------

             Summary: Availability problem with TikaServer 3.1.0 
List-ID:<user.tika.apache.org>
                 Key: TIKA-4422
                 URL: https://issues.apache.org/jira/browse/TIKA-4422
             Project: Tika
          Issue Type: Bug
          Components: tika-server
    Affects Versions: 3.1.0
         Environment: Java21

Ubuntu22

 
            Reporter: Tino Schöllhorn


Hi,

we have a problem when running the TikaServer. We use Tika 3.1.0 on Ubuntu with 
Java21. 
Previously, we used Tika 2.4.x - there we could not observe this problem. 

We run a *lot* of text-extraction requests. After a few hours (8-10h) Tika is 
not able to restart its worker processes. 
Tika runs via systemd and via journalctl we see the following output:

-- journalct.start
May 28 04:39:39 dss-index java[350084]: INFO  [pool-2-thread-1] 04:39:39,752 
org.apache.tika.server.core.TikaServerWatchDog forked process exited with exit 
value 3
May 28 04:39:40 dss-index java[376963]: May 28, 2025 4:39:40 AM 
org.apache.cxf.endpoint.ServerImpl initDestination
May 28 04:39:40 dss-index java[376963]: INFO: Setting the server's publish 
address to be http://localhost:9998/
May 28 05:35:32 dss-index java[350084]: INFO  [pool-2-thread-1] 05:35:32,896 
org.apache.tika.server.core.TikaServerWatchDog forked process exited with exit 
value 2
May 28 05:35:34 dss-index java[377213]: May 28, 2025 5:35:34 AM 
org.apache.cxf.endpoint.ServerImpl initDestination
May 28 05:35:34 dss-index java[377213]: INFO: Setting the server's publish 
address to be http://localhost:9998/
-- journalct.end

After these messages the TikaServer does not respond to requests any more. A 
restart of the Tika-Parent process is the only thing which helps. 
The error messages are emitted in TikaServerWatchDog:161. Yet, I do not 
understand what is going wrong here. Probably the messages are error 
messages from the OS. perror gives the following output: 

OS error code   2:  No such file or directory
OS error code   3:  No such process

Yet, it is unclear to me, what happens. Below you'll find the tika.config. 

As far as I understand the situation this seems a bug which has been introduced 
sometime between version 2.4.x and 3.1.0. 

Hope that someone has an idea what is going on and how this can be remedied. 

Tino


-- tika.config.start
<?xml version="1.0" encoding="UTF-8"?>
<properties>
   <parsers>
      <parser class="org.apache.tika.parser.DefaultParser">
      </parser>
   </parsers>
   <server>
    <params>
      <port>9998</port>
      <host>localhost</host>
      <digest>sha256</digest>
      <digestMarkLimit>1000000</digestMarkLimit>
      <id></id>
      <cors>NONE</cors>
      <logLevel>info</logLevel>
      <returnStackTrace>false</returnStackTrace>
      <noFork>false</noFork>
      <taskTimeoutMillis>300000</taskTimeoutMillis>
      <maxForkedStartupMillis>120000</maxForkedStartupMillis>
      <maxRestarts>-1</maxRestarts>
      <maxFiles>25000</maxFiles>
      <javaPath>java</javaPath>
      <forkedJvmArgs>
        <arg>-Xms4g</arg>
        <arg>-Xmx4g</arg>
        <arg>-Dlog4j.configurationFile=tika-forked-log4j2.xml</arg>
       </forkedJvmArgs>

      <enableUnsecureFeatures>false</enableUnsecureFeatures>

      <endpoints>
        <endpoint>status</endpoint>
        <endpoint>tika</endpoint>
        <endpoint>rmeta</endpoint>
        <endpoint>language</endpoint>
      </endpoints>
    </params>
  </server>
</properties>
-- tika.config.stop

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to