[ https://issues.apache.org/jira/browse/TIKA-4367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
mbiso updated TIKA-4367: ------------------------ Description: Hi. i have this problem on my tika-server running in a docker container. Due to large files, i obtain timeout and the tika process down. this is the error: {code:java} 2025-01-16T01:29:19.096206347Z INFO [qtp274100821-133] 02:29:19,096 org.apache.tika.server.core.resource.MetadataResource /meta (application/pdf) 2025-01-16T01:29:19.120130385Z INFO [qtp274100821-270] 02:29:19,120 org.apache.tika.server.core.resource.TikaResource /tika (application/pdf) 2025-01-16T01:29:19.213411527Z INFO [qtp274100821-133] 02:29:19,213 org.apache.tika.server.core.resource.MetadataResource /meta (application/pdf) 2025-01-16T01:29:19.230454549Z INFO [qtp274100821-270] 02:29:19,230 org.apache.tika.server.core.resource.TikaResource /tika (application/pdf) 2025-01-16T01:56:18.370380628Z INFO [qtp274100821-284] 02:56:18,370 org.apache.tika.server.core.resource.MetadataResource /meta (application/pdf) 2025-01-16T02:01:18.430280014Z ERROR [Thread-11] 03:01:18,428 org.apache.tika.server.core.ServerStatusWatcher Timeout task PARSE, millis elapsed 300055; consider increasing the allowable time with the <taskTimeoutMillis/> parameter or the X-Tika-Timeout-Millis header 2025-01-16T02:01:18.437740057Z WARN [Thread-11] 03:01:18,437 org.apache.tika.server.core.ServerStatusWatcher forked process observed TIMEOUT and is shutting down. 2025-01-16T02:01:18.439693546Z INFO [Thread-11] 03:01:18,439 org.apache.tika.server.core.ServerStatusWatcher Shutting down forked process with status: TIMEOUT 2025-01-16T02:01:19.851234798Z INFO [pool-2-thread-1] 03:01:19,817 org.apache.tika.server.core.TikaServerWatchDog forked process exited with exit value 3 2025-01-16T02:01:20.644728948Z INFO [main] 03:01:20,643 org.apache.tika.server.core.TikaServerProcess Starting Apache Tika 3.0.0 server 2025-01-16T02:01:20.773526359Z INFO [main] 03:01:20,772 org.apache.tika.server.core.TikaServerProcess Using custom config: /tika-config.xml 2025-01-16T02:01:21.358160073Z INFO [main] 03:01:21,357 org.apache.tika.server.core.TikaServerProcess loading resource from SPI: class org.apache.tika.server.standard.resource.XMPMetadataResource 2025-01-16T02:01:21.527210481Z Jan 16, 2025 3:01:21 AM org.apache.cxf.endpoint.ServerImpl initDestination 2025-01-16T02:01:21.527237406Z INFO: Setting the server's publish address to be http://0.0.0.0:9998/ 2025-01-16T02:01:21.627014872Z INFO [main] 03:01:21,626 org.eclipse.jetty.server.Server jetty-11.0.24; built: 2024-08-26T18:11:22.448Z; git: 5dfc59a691b748796f922208956bd1f2794bcd16; jvm 21.0.5+11-Ubuntu-1ubuntu124.04 2025-01-16T02:01:21.685264827Z INFO [main] 03:01:21,684 org.eclipse.jetty.server.AbstractConnector Started ServerConnector@50b1f030{HTTP/1.1, (http/1.1)} {0.0.0.0:9998} 2025-01-16T02:01:21.687671013Z INFO [main] 03:01:21,687 org.eclipse.jetty.server.Server Started Server@6034e75d{STARTING}[11.0.24,sto=0] @1755ms 2025-01-16T02:01:21.711747262Z INFO [main] 03:01:21,711 org.eclipse.jetty.server.handler.ContextHandler Started o.a.c.t.h.JettyContextHandler@56febdc{/,null,AVAILABLE} 2025-01-16T02:01:21.716535893Z INFO [main] 03:01:21,716 org.apache.tika.server.core.TikaServerProcess Started Apache Tika server 5598029c-6de7-4b53-8284-0f18814c049f at http://0.0.0.0:9998/ {code} My issue is, because ManifoldCF uses tika to parse the files, the ManifoldCF job ends with: "Error: Repeated service interruptions - failure processing document: The target server failed to respond" Is there a way to avoid the shutdown of tika process for timeout? In the attachment, you find my tika-config.xml if it could help. Thanks a lot Mario was: Hi. i have this problem on my tika-server running in a docker container. Due to large files, i obtain timeout and the tika process down. this is the error: 2025-01-16T01:29:19.096206347Z INFO [qtp274100821-133] 02:29:19,096 org.apache.tika.server.core.resource.MetadataResource /meta (application/pdf) 2025-01-16T01:29:19.120130385Z INFO [qtp274100821-270] 02:29:19,120 org.apache.tika.server.core.resource.TikaResource /tika (application/pdf) 2025-01-16T01:29:19.213411527Z INFO [qtp274100821-133] 02:29:19,213 org.apache.tika.server.core.resource.MetadataResource /meta (application/pdf) 2025-01-16T01:29:19.230454549Z INFO [qtp274100821-270] 02:29:19,230 org.apache.tika.server.core.resource.TikaResource /tika (application/pdf) 2025-01-16T01:56:18.370380628Z INFO [qtp274100821-284] 02:56:18,370 org.apache.tika.server.core.resource.MetadataResource /meta (application/pdf) 2025-01-16T02:01:18.430280014Z ERROR [Thread-11] 03:01:18,428 org.apache.tika.server.core.ServerStatusWatcher Timeout task PARSE, millis elapsed 300055; consider increasing the allowable time with the <taskTimeoutMillis/> parameter or the X-Tika-Timeout-Millis header 2025-01-16T02:01:18.437740057Z WARN [Thread-11] 03:01:18,437 org.apache.tika.server.core.ServerStatusWatcher forked process observed TIMEOUT and is shutting down. 2025-01-16T02:01:18.439693546Z INFO [Thread-11] 03:01:18,439 org.apache.tika.server.core.ServerStatusWatcher Shutting down forked process with status: TIMEOUT 2025-01-16T02:01:19.851234798Z INFO [pool-2-thread-1] 03:01:19,817 org.apache.tika.server.core.TikaServerWatchDog forked process exited with exit value 3 2025-01-16T02:01:20.644728948Z INFO [main] 03:01:20,643 org.apache.tika.server.core.TikaServerProcess Starting Apache Tika 3.0.0 server 2025-01-16T02:01:20.773526359Z INFO [main] 03:01:20,772 org.apache.tika.server.core.TikaServerProcess Using custom config: /tika-config.xml 2025-01-16T02:01:21.358160073Z INFO [main] 03:01:21,357 org.apache.tika.server.core.TikaServerProcess loading resource from SPI: class org.apache.tika.server.standard.resource.XMPMetadataResource 2025-01-16T02:01:21.527210481Z Jan 16, 2025 3:01:21 AM org.apache.cxf.endpoint.ServerImpl initDestination 2025-01-16T02:01:21.527237406Z INFO: Setting the server's publish address to be [http://0.0.0.0:9998/] 2025-01-16T02:01:21.627014872Z INFO [main] 03:01:21,626 org.eclipse.jetty.server.Server jetty-11.0.24; built: 2024-08-26T18:11:22.448Z; git: 5dfc59a691b748796f922208956bd1f2794bcd16; jvm 21.0.5+11-Ubuntu-1ubuntu124.04 2025-01-16T02:01:21.685264827Z INFO [main] 03:01:21,684 org.eclipse.jetty.server.AbstractConnector Started ServerConnector@50b1f030\{HTTP/1.1, (http/1.1)} {0.0.0.0:9998} 2025-01-16T02:01:21.687671013Z INFO [main] 03:01:21,687 org.eclipse.jetty.server.Server Started Server@6034e75d\{STARTING}[11.0.24,sto=0] @1755ms 2025-01-16T02:01:21.711747262Z INFO [main] 03:01:21,711 org.eclipse.jetty.server.handler.ContextHandler Started o.a.c.t.h.JettyContextHandler@56febdc\{/,null,AVAILABLE} 2025-01-16T02:01:21.716535893Z INFO [main] 03:01:21,716 org.apache.tika.server.core.TikaServerProcess Started Apache Tika server 5598029c-6de7-4b53-8284-0f18814c049f at [http://0.0.0.0:9998/] My issue is, because ManifoldCF uses tika to parse the files, the ManifoldCF job ends with: "Error: Repeated service interruptions - failure processing document: The target server failed to respond" Is there a way to avoid the shutdown of tika process for timeout? In the attachment, you find my tika-config.xml if it could help. Thanks a lot Mario > Problem with the: org.apache.tika.server.core.ServerStatusWatcher forked > process observed TIMEOUT and is shutting down > ---------------------------------------------------------------------------------------------------------------------- > > Key: TIKA-4367 > URL: https://issues.apache.org/jira/browse/TIKA-4367 > Project: Tika > Issue Type: Bug > Components: tika-server > Affects Versions: 3.0.0 > Reporter: mbiso > Priority: Major > Attachments: tika-config.xml > > > Hi. > i have this problem on my tika-server running in a docker container. > Due to large files, i obtain timeout and the tika process down. > this is the error: > > {code:java} > 2025-01-16T01:29:19.096206347Z INFO [qtp274100821-133] 02:29:19,096 > org.apache.tika.server.core.resource.MetadataResource /meta (application/pdf) > 2025-01-16T01:29:19.120130385Z INFO [qtp274100821-270] 02:29:19,120 > org.apache.tika.server.core.resource.TikaResource /tika (application/pdf) > 2025-01-16T01:29:19.213411527Z INFO [qtp274100821-133] 02:29:19,213 > org.apache.tika.server.core.resource.MetadataResource /meta (application/pdf) > 2025-01-16T01:29:19.230454549Z INFO [qtp274100821-270] 02:29:19,230 > org.apache.tika.server.core.resource.TikaResource /tika (application/pdf) > 2025-01-16T01:56:18.370380628Z INFO [qtp274100821-284] 02:56:18,370 > org.apache.tika.server.core.resource.MetadataResource /meta (application/pdf) > 2025-01-16T02:01:18.430280014Z ERROR [Thread-11] 03:01:18,428 > org.apache.tika.server.core.ServerStatusWatcher Timeout task PARSE, millis > elapsed 300055; consider increasing the allowable time with the > <taskTimeoutMillis/> parameter or the X-Tika-Timeout-Millis header > 2025-01-16T02:01:18.437740057Z WARN [Thread-11] 03:01:18,437 > org.apache.tika.server.core.ServerStatusWatcher forked process observed > TIMEOUT and is shutting down. > 2025-01-16T02:01:18.439693546Z INFO [Thread-11] 03:01:18,439 > org.apache.tika.server.core.ServerStatusWatcher Shutting down forked process > with status: TIMEOUT > 2025-01-16T02:01:19.851234798Z INFO [pool-2-thread-1] 03:01:19,817 > org.apache.tika.server.core.TikaServerWatchDog forked process exited with > exit value 3 > 2025-01-16T02:01:20.644728948Z INFO [main] 03:01:20,643 > org.apache.tika.server.core.TikaServerProcess Starting Apache Tika 3.0.0 > server > 2025-01-16T02:01:20.773526359Z INFO [main] 03:01:20,772 > org.apache.tika.server.core.TikaServerProcess Using custom config: > /tika-config.xml > 2025-01-16T02:01:21.358160073Z INFO [main] 03:01:21,357 > org.apache.tika.server.core.TikaServerProcess loading resource from SPI: > class org.apache.tika.server.standard.resource.XMPMetadataResource > 2025-01-16T02:01:21.527210481Z Jan 16, 2025 3:01:21 AM > org.apache.cxf.endpoint.ServerImpl initDestination > 2025-01-16T02:01:21.527237406Z INFO: Setting the server's publish address to > be http://0.0.0.0:9998/ > 2025-01-16T02:01:21.627014872Z INFO [main] 03:01:21,626 > org.eclipse.jetty.server.Server jetty-11.0.24; built: > 2024-08-26T18:11:22.448Z; git: 5dfc59a691b748796f922208956bd1f2794bcd16; jvm > 21.0.5+11-Ubuntu-1ubuntu124.04 > 2025-01-16T02:01:21.685264827Z INFO [main] 03:01:21,684 > org.eclipse.jetty.server.AbstractConnector Started > ServerConnector@50b1f030{HTTP/1.1, (http/1.1)} > {0.0.0.0:9998} > 2025-01-16T02:01:21.687671013Z INFO [main] 03:01:21,687 > org.eclipse.jetty.server.Server Started > Server@6034e75d{STARTING}[11.0.24,sto=0] @1755ms > 2025-01-16T02:01:21.711747262Z INFO [main] 03:01:21,711 > org.eclipse.jetty.server.handler.ContextHandler Started > o.a.c.t.h.JettyContextHandler@56febdc{/,null,AVAILABLE} > 2025-01-16T02:01:21.716535893Z INFO [main] 03:01:21,716 > org.apache.tika.server.core.TikaServerProcess Started Apache Tika server > 5598029c-6de7-4b53-8284-0f18814c049f at http://0.0.0.0:9998/ > {code} > > My issue is, because ManifoldCF uses tika to parse the files, the ManifoldCF > job ends with: "Error: Repeated service interruptions - failure processing > document: The target server failed to respond" > Is there a way to avoid the shutdown of tika process for timeout? > In the attachment, you find my tika-config.xml if it could help. > Thanks a lot > Mario > > > > -- This message was sent by Atlassian Jira (v8.20.10#820010)