[ 
https://issues.apache.org/jira/browse/TIKA-4186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Itai updated TIKA-4186:
-----------------------
    Description: 
The Tika server shuts down and restarts in case of an issue (OOM, crash, 
timeout).
When tika server shut down, all active connections are being closed.
A single connection can cause a side effect on other connections.

This makes it hard to make parallel calls to a single server in a production 
environment.

How to reproduce?
 - prepare a large sample.pdf file that takes more then 30secs to digest.

run:
java -jar ~/Downloads/tika-server-standard-2.9.1.jar
—
terminal 2 run:

curl -v -T sample.pdf  [http://localhost:9998/tika] --header "Accept: 
text/plain" --header "X-Tika-Timeout-Millis: 30001"
—
wait ~20-25 seconds
—
terminal 3 run:
curl -v -T sample.pdf  [http://localhost:9998/tika] --header "Accept: 
text/plain"

Expected result:
 - terminal 2 connection should timeout after 30 secs
 - terminal 3 connection should not timeout and return successfully.

Actual result:
 - both curl commands fails after 30 secs.

logs:
```

INFO  [qtp486662053-44] 11:57:30,251 
org.apache.tika.server.core.resource.TikaResource /tika (autodetecting type)

WARN  [qtp486662053-44] 11:57:30,278 org.apache.pdfbox.pdfparser.BaseParser 
Empty COSName at offset 628452

ERROR [Thread-21] 11:57:37,566 org.apache.tika.server.core.ServerStatusWatcher 
Timeout task PARSE, millis elapsed 30014; consider increasing the allowable 
time with the <taskTimeoutMillis/> parameter or the X-Tika-Timeout-Millis header

WARN  [Thread-21] 11:57:37,573 org.apache.tika.server.core.ServerStatusWatcher 
forked process observed TIMEOUT and is shutting down.

INFO  [Thread-21] 11:57:37,613 org.apache.tika.server.core.ServerStatusWatcher 
Shutting down forked process with status: TIMEOUT

INFO  [pool-2-thread-1] 11:57:38,039 
org.apache.tika.server.core.TikaServerWatchDog forked process exited with exit 
value 3

INFO  [main] 11:57:39,340 org.apache.tika.server.core.TikaServerProcess 
Starting Apache Tika 2.9.1 server

INFO  [main] 11:57:39,564 org.apache.tika.server.core.TikaServerProcess loading 
resource from SPI: class 
org.apache.tika.server.standard.resource.XMPMetadataResource

Jan 29, 2024 11:57:39 AM org.apache.cxf.endpoint.ServerImpl initDestination

INFO: Setting the server's publish address to be [http://localhost:9998/]

INFO  [main] 11:57:39,747 org.eclipse.jetty.util.log Logging initialized 
@1640ms to org.eclipse.jetty.util.log.Slf4jLog

INFO  [main] 11:57:39,790 org.eclipse.jetty.server.Server 
jetty-9.4.53.v20231009; built: 2023-10-09T12:29:09.265Z; git: 
27bde00a0b95a1d5bbee0eae7984f891d2d0f8c9; jvm 21.0.1

INFO  [main] 11:57:39,833 org.eclipse.jetty.server.AbstractConnector Started 
ServerConnector@48bfb884\{HTTP/1.1, (http/1.1)}

{localhost:9998}

INFO  [main] 11:57:39,833 org.eclipse.jetty.server.Server Started @1729ms

```
—
```
 *   Trying 127.0.0.1:9998...

 * Connected to localhost (127.0.0.1) port 9998 (#0)

> PUT /tika HTTP/1.1

> Host: localhost:9998

> User-Agent: curl/7.85.0

> Accept: text/plain

> Content-Length: 636978

> Expect: 100-continue

>
 * Mark bundle as not supporting multiuse

< HTTP/1.1 100 Continue
 * We are completely uploaded and fine

 * Empty reply from server

 * Closing connection 0

curl: (52) Empty reply from server
```

 

 

  was:
The Tika server shuts down and restarts in case of an issue (OOM, crash, 
timeout).
When tika server shut down, all active connections are being closed.
A single connection can cause a side effect on other connections.

This makes it hard to make parallel calls to a single server in a production 
environment.


How to reproduce?
- prepare a large sample.pdf file that takes more then 30secs to digest.

run:
java -jar ~/Downloads/tika-server-standard-2.9.1.jar
---
terminal 2 run:

curl -v -T sample.pdf  http://localhost:9998/tika --header "Accept: text/plain" 
--header "X-Tika-Timeout-Millis: 30001"
---
wait ~20-25 seconds
---
terminal 3 run:
curl -v -T sample.pdf  http://localhost:9998/tika --header "Accept: text/plain"

Expected result:
- terminal 2 connection should timeout after 30 secs
-  terminal 3 connection should not timeout and return successfuly.
Actual result:
- both curl commends fails after 30 secs.


logs:
```


INFO  [qtp486662053-44] 11:57:30,251 
org.apache.tika.server.core.resource.TikaResource /tika (autodetecting type)

WARN  [qtp486662053-44] 11:57:30,278 org.apache.pdfbox.pdfparser.BaseParser 
Empty COSName at offset 628452

ERROR [Thread-21] 11:57:37,566 org.apache.tika.server.core.ServerStatusWatcher 
Timeout task PARSE, millis elapsed 30014; consider increasing the allowable 
time with the <taskTimeoutMillis/> parameter or the X-Tika-Timeout-Millis header

WARN  [Thread-21] 11:57:37,573 org.apache.tika.server.core.ServerStatusWatcher 
forked process observed TIMEOUT and is shutting down.

INFO  [Thread-21] 11:57:37,613 org.apache.tika.server.core.ServerStatusWatcher 
Shutting down forked process with status: TIMEOUT

INFO  [pool-2-thread-1] 11:57:38,039 
org.apache.tika.server.core.TikaServerWatchDog forked process exited with exit 
value 3

INFO  [main] 11:57:39,340 org.apache.tika.server.core.TikaServerProcess 
Starting Apache Tika 2.9.1 server

INFO  [main] 11:57:39,564 org.apache.tika.server.core.TikaServerProcess loading 
resource from SPI: class 
org.apache.tika.server.standard.resource.XMPMetadataResource

Jan 29, 2024 11:57:39 AM org.apache.cxf.endpoint.ServerImpl initDestination

INFO: Setting the server's publish address to be http://localhost:9998/

INFO  [main] 11:57:39,747 org.eclipse.jetty.util.log Logging initialized 
@1640ms to org.eclipse.jetty.util.log.Slf4jLog

INFO  [main] 11:57:39,790 org.eclipse.jetty.server.Server 
jetty-9.4.53.v20231009; built: 2023-10-09T12:29:09.265Z; git: 
27bde00a0b95a1d5bbee0eae7984f891d2d0f8c9; jvm 21.0.1

INFO  [main] 11:57:39,833 org.eclipse.jetty.server.AbstractConnector Started 
ServerConnector@48bfb884\{HTTP/1.1, (http/1.1)}{localhost:9998}

INFO  [main] 11:57:39,833 org.eclipse.jetty.server.Server Started @1729ms

```
---
```


*   Trying 127.0.0.1:9998...

* Connected to localhost (127.0.0.1) port 9998 (#0)

> PUT /tika HTTP/1.1

> Host: localhost:9998

> User-Agent: curl/7.85.0

> Accept: text/plain

> Content-Length: 636978

> Expect: 100-continue

> 

* Mark bundle as not supporting multiuse

< HTTP/1.1 100 Continue

* We are completely uploaded and fine

* Empty reply from server

* Closing connection 0

curl: (52) Empty reply from server
```

 

 


> tika server shut down innocent connections
> ------------------------------------------
>
>                 Key: TIKA-4186
>                 URL: https://issues.apache.org/jira/browse/TIKA-4186
>             Project: Tika
>          Issue Type: Bug
>          Components: tika-server
>    Affects Versions: 2.9.1
>         Environment: macOS running tika-server-standard-2.9.1.jar
>            Reporter: Itai
>            Priority: Major
>
> The Tika server shuts down and restarts in case of an issue (OOM, crash, 
> timeout).
> When tika server shut down, all active connections are being closed.
> A single connection can cause a side effect on other connections.
> This makes it hard to make parallel calls to a single server in a production 
> environment.
> How to reproduce?
>  - prepare a large sample.pdf file that takes more then 30secs to digest.
> run:
> java -jar ~/Downloads/tika-server-standard-2.9.1.jar
> —
> terminal 2 run:
> curl -v -T sample.pdf  [http://localhost:9998/tika] --header "Accept: 
> text/plain" --header "X-Tika-Timeout-Millis: 30001"
> —
> wait ~20-25 seconds
> —
> terminal 3 run:
> curl -v -T sample.pdf  [http://localhost:9998/tika] --header "Accept: 
> text/plain"
> Expected result:
>  - terminal 2 connection should timeout after 30 secs
>  - terminal 3 connection should not timeout and return successfully.
> Actual result:
>  - both curl commands fails after 30 secs.
> logs:
> ```
> INFO  [qtp486662053-44] 11:57:30,251 
> org.apache.tika.server.core.resource.TikaResource /tika (autodetecting type)
> WARN  [qtp486662053-44] 11:57:30,278 org.apache.pdfbox.pdfparser.BaseParser 
> Empty COSName at offset 628452
> ERROR [Thread-21] 11:57:37,566 
> org.apache.tika.server.core.ServerStatusWatcher Timeout task PARSE, millis 
> elapsed 30014; consider increasing the allowable time with the 
> <taskTimeoutMillis/> parameter or the X-Tika-Timeout-Millis header
> WARN  [Thread-21] 11:57:37,573 
> org.apache.tika.server.core.ServerStatusWatcher forked process observed 
> TIMEOUT and is shutting down.
> INFO  [Thread-21] 11:57:37,613 
> org.apache.tika.server.core.ServerStatusWatcher Shutting down forked process 
> with status: TIMEOUT
> INFO  [pool-2-thread-1] 11:57:38,039 
> org.apache.tika.server.core.TikaServerWatchDog forked process exited with 
> exit value 3
> INFO  [main] 11:57:39,340 org.apache.tika.server.core.TikaServerProcess 
> Starting Apache Tika 2.9.1 server
> INFO  [main] 11:57:39,564 org.apache.tika.server.core.TikaServerProcess 
> loading resource from SPI: class 
> org.apache.tika.server.standard.resource.XMPMetadataResource
> Jan 29, 2024 11:57:39 AM org.apache.cxf.endpoint.ServerImpl initDestination
> INFO: Setting the server's publish address to be [http://localhost:9998/]
> INFO  [main] 11:57:39,747 org.eclipse.jetty.util.log Logging initialized 
> @1640ms to org.eclipse.jetty.util.log.Slf4jLog
> INFO  [main] 11:57:39,790 org.eclipse.jetty.server.Server 
> jetty-9.4.53.v20231009; built: 2023-10-09T12:29:09.265Z; git: 
> 27bde00a0b95a1d5bbee0eae7984f891d2d0f8c9; jvm 21.0.1
> INFO  [main] 11:57:39,833 org.eclipse.jetty.server.AbstractConnector Started 
> ServerConnector@48bfb884\{HTTP/1.1, (http/1.1)}
> {localhost:9998}
> INFO  [main] 11:57:39,833 org.eclipse.jetty.server.Server Started @1729ms
> ```
> —
> ```
>  *   Trying 127.0.0.1:9998...
>  * Connected to localhost (127.0.0.1) port 9998 (#0)
> > PUT /tika HTTP/1.1
> > Host: localhost:9998
> > User-Agent: curl/7.85.0
> > Accept: text/plain
> > Content-Length: 636978
> > Expect: 100-continue
> >
>  * Mark bundle as not supporting multiuse
> < HTTP/1.1 100 Continue
>  * We are completely uploaded and fine
>  * Empty reply from server
>  * Closing connection 0
> curl: (52) Empty reply from server
> ```
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to