[ https://issues.apache.org/jira/browse/TIKA-2180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15732196#comment-15732196 ]
Tim Allison commented on TIKA-2180: ----------------------------------- If I don't read from the response, after a few files, I get this: {noformat} SEVERE: Problem with writing the data, class org.apache.tika.server.resource.TikaResource$4, ContentType: text/plain Dec 08, 2016 8:13:56 AM org.apache.cxf.phase.PhaseInterceptorChain doDefaultLogging WARNING: Interceptor for {http://resource.server.tika.apache.org/}TikaResource has thrown exception, unwinding now org.apache.cxf.interceptor.Fault: Could not send Message. at org.apache.cxf.interceptor.MessageSenderInterceptor$MessageSenderEndingInterceptor.handleMessage(MessageSenderInterceptor.java:64) at org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307) at org.apache.cxf.interceptor.OutgoingChainInterceptor.handleMessage(OutgoingChainInterceptor.java:83) at org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307) at org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121) at org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:251) at org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:261) at org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:70) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1088) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1024) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:366) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494) at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:982) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1043) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:957) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82) at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:696) at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:53) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:745) Caused by: org.eclipse.jetty.io.EofException at org.eclipse.jetty.http.HttpGenerator.flushBuffer(HttpGenerator.java:914) at org.eclipse.jetty.server.AbstractHttpConnection.flushResponse(AbstractHttpConnection.java:686) at org.eclipse.jetty.server.AbstractHttpConnection$Output.close(AbstractHttpConnection.java:1108) at org.apache.cxf.transport.http_jetty.JettyHTTPDestination$JettyOutputStream.close(JettyHTTPDestination.java:332) at org.apache.cxf.transport.http.AbstractHTTPDestination$WrappedOutputStream.close(AbstractHTTPDestination.java:790) at org.apache.cxf.transport.AbstractConduit.close(AbstractConduit.java:56) at org.apache.cxf.transport.http.AbstractHTTPDestination$BackChannelConduit.close(AbstractHTTPDestination.java:720) at org.apache.cxf.interceptor.MessageSenderInterceptor$MessageSenderEndingInterceptor.handleMessage(MessageSenderInterceptor.java:62) ... 24 more Caused by: java.io.IOException: An established connection was aborted by the software in your host machine at sun.nio.ch.SocketDispatcher.write0(Native Method) at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:51) at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) at sun.nio.ch.IOUtil.write(IOUtil.java:65) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471) at org.eclipse.jetty.io.nio.ChannelEndPoint.flush(ChannelEndPoint.java:293) at org.eclipse.jetty.io.nio.SelectChannelEndPoint.flush(SelectChannelEndPoint.java:404) at org.eclipse.jetty.io.nio.ChannelEndPoint.flush(ChannelEndPoint.java:341) at org.eclipse.jetty.io.nio.SelectChannelEndPoint.flush(SelectChannelEndPoint.java:378) at org.eclipse.jetty.http.HttpGenerator.flushBuffer(HttpGenerator.java:841) ... 31 more {noformat} However, if I do something like this: for (int i = 0; i < 20; i++) { try (InputStream is = TikaInputStream.get(new File("C:/data/test_in/docx/Document (1) - Copy.docx"))) { Response response = WebClient.create(endPoint + TIKA_PATH) //.type("application/rtf") .accept("text/plain") .put(is); Path outFile = Paths.get("C:/data/test_out/out_"+i+".txt"); if (Files.isRegularFile(outFile)) { Files.delete(outFile); } Files.copy((InputStream)response.getEntity(), outFile); System.out.println("RESULT: " + response.getStatus()); if (response.getStatus() != 200) { ex++; } } fileCount++; } I don't get any exceptions. I'm not sure if it is the delay that copying the bytes out is preventing the exceptions or if reading the content is necessary to prevent the exceptions. Also, my memory usage with the new parser never goes above 300MB. Are you sure you're using the new parser? > Multiple requests on Tika to extract text slows down > ---------------------------------------------------- > > Key: TIKA-2180 > URL: https://issues.apache.org/jira/browse/TIKA-2180 > Project: Tika > Issue Type: Bug > Components: server > Affects Versions: 1.13, 1.14 > Environment: Windows OS, Open JDK, 4 core 32 GB RAM > Reporter: Ashish Basran > Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png, > with new experimental SAX docx parser.png > > > I observed that if I send multiple requests to Tika (eg. > http://localhost:8080/tika) with around 5MB files, Tika is very slow in > completing the action. I tried with ~20 random files, it took 170 seconds to > process all the files in sequence. If I pass all files in parallel, it took > around 780 seconds to process same set of files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)