Itai created TIKA-4186: -------------------------- Summary: tika server shut down innocent connections Key: TIKA-4186 URL: https://issues.apache.org/jira/browse/TIKA-4186 Project: Tika Issue Type: Bug Components: tika-server Affects Versions: 2.9.1 Environment: macOS running tika-server-standard-2.9.1.jar Reporter: Itai
The Tika server shuts down and restarts in case of an issue (OOM, crash, timeout). When tika server shut down, all active connections are being closed. A single connection can cause a side effect on other connections. This makes it hard to make parallel calls to a single server in a production environment. How to reproduce? - prepare a large sample.pdf file that takes more then 30secs to digest. run: java -jar ~/Downloads/tika-server-standard-2.9.1.jar --- terminal 2 run: curl -v -T sample.pdf http://localhost:9998/tika --header "Accept: text/plain" --header "X-Tika-Timeout-Millis: 30001" --- wait ~20-25 seconds --- terminal 3 run: curl -v -T sample.pdf http://localhost:9998/tika --header "Accept: text/plain" Expected result: - terminal 2 connection should timeout after 30 secs - terminal 3 connection should not timeout and return successfuly. Actual result: - both curl commends fails after 30 secs. logs: ``` INFO [qtp486662053-44] 11:57:30,251 org.apache.tika.server.core.resource.TikaResource /tika (autodetecting type) WARN [qtp486662053-44] 11:57:30,278 org.apache.pdfbox.pdfparser.BaseParser Empty COSName at offset 628452 ERROR [Thread-21] 11:57:37,566 org.apache.tika.server.core.ServerStatusWatcher Timeout task PARSE, millis elapsed 30014; consider increasing the allowable time with the <taskTimeoutMillis/> parameter or the X-Tika-Timeout-Millis header WARN [Thread-21] 11:57:37,573 org.apache.tika.server.core.ServerStatusWatcher forked process observed TIMEOUT and is shutting down. INFO [Thread-21] 11:57:37,613 org.apache.tika.server.core.ServerStatusWatcher Shutting down forked process with status: TIMEOUT INFO [pool-2-thread-1] 11:57:38,039 org.apache.tika.server.core.TikaServerWatchDog forked process exited with exit value 3 INFO [main] 11:57:39,340 org.apache.tika.server.core.TikaServerProcess Starting Apache Tika 2.9.1 server INFO [main] 11:57:39,564 org.apache.tika.server.core.TikaServerProcess loading resource from SPI: class org.apache.tika.server.standard.resource.XMPMetadataResource Jan 29, 2024 11:57:39 AM org.apache.cxf.endpoint.ServerImpl initDestination INFO: Setting the server's publish address to be http://localhost:9998/ INFO [main] 11:57:39,747 org.eclipse.jetty.util.log Logging initialized @1640ms to org.eclipse.jetty.util.log.Slf4jLog INFO [main] 11:57:39,790 org.eclipse.jetty.server.Server jetty-9.4.53.v20231009; built: 2023-10-09T12:29:09.265Z; git: 27bde00a0b95a1d5bbee0eae7984f891d2d0f8c9; jvm 21.0.1 INFO [main] 11:57:39,833 org.eclipse.jetty.server.AbstractConnector Started ServerConnector@48bfb884\{HTTP/1.1, (http/1.1)}{localhost:9998} INFO [main] 11:57:39,833 org.eclipse.jetty.server.Server Started @1729ms ``` --- ``` * Trying 127.0.0.1:9998... * Connected to localhost (127.0.0.1) port 9998 (#0) > PUT /tika HTTP/1.1 > Host: localhost:9998 > User-Agent: curl/7.85.0 > Accept: text/plain > Content-Length: 636978 > Expect: 100-continue > * Mark bundle as not supporting multiuse < HTTP/1.1 100 Continue * We are completely uploaded and fine * Empty reply from server * Closing connection 0 curl: (52) Empty reply from server ``` -- This message was sent by Atlassian Jira (v8.20.10#820010)