[ 
https://issues.apache.org/jira/browse/TIKA-2180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15732196#comment-15732196
 ] 

Tim Allison commented on TIKA-2180:
-----------------------------------

If I don't read from the response, after a few files, I get this:
{noformat}
SEVERE: Problem with writing the data, class 
org.apache.tika.server.resource.TikaResource$4, ContentType: text/plain
Dec 08, 2016 8:13:56 AM org.apache.cxf.phase.PhaseInterceptorChain 
doDefaultLogging
WARNING: Interceptor for {http://resource.server.tika.apache.org/}TikaResource 
has thrown exception, unwinding now
org.apache.cxf.interceptor.Fault: Could not send Message.
        at 
org.apache.cxf.interceptor.MessageSenderInterceptor$MessageSenderEndingInterceptor.handleMessage(MessageSenderInterceptor.java:64)
        at 
org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307)
        at 
org.apache.cxf.interceptor.OutgoingChainInterceptor.handleMessage(OutgoingChainInterceptor.java:83)
        at 
org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307)
        at 
org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121)
        at 
org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:251)
        at 
org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:261)
        at 
org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:70)
        at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1088)
        at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1024)
        at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
        at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
        at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
        at org.eclipse.jetty.server.Server.handle(Server.java:366)
        at 
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494)
        at 
org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:982)
        at 
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1043)
        at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:957)
        at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
        at 
org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
        at 
org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:696)
        at 
org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:53)
        at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
        at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.eclipse.jetty.io.EofException
        at 
org.eclipse.jetty.http.HttpGenerator.flushBuffer(HttpGenerator.java:914)
        at 
org.eclipse.jetty.server.AbstractHttpConnection.flushResponse(AbstractHttpConnection.java:686)
        at 
org.eclipse.jetty.server.AbstractHttpConnection$Output.close(AbstractHttpConnection.java:1108)
        at 
org.apache.cxf.transport.http_jetty.JettyHTTPDestination$JettyOutputStream.close(JettyHTTPDestination.java:332)
        at 
org.apache.cxf.transport.http.AbstractHTTPDestination$WrappedOutputStream.close(AbstractHTTPDestination.java:790)
        at 
org.apache.cxf.transport.AbstractConduit.close(AbstractConduit.java:56)
        at 
org.apache.cxf.transport.http.AbstractHTTPDestination$BackChannelConduit.close(AbstractHTTPDestination.java:720)
        at 
org.apache.cxf.interceptor.MessageSenderInterceptor$MessageSenderEndingInterceptor.handleMessage(MessageSenderInterceptor.java:62)
        ... 24 more
Caused by: java.io.IOException: An established connection was aborted by the 
software in your host machine
        at sun.nio.ch.SocketDispatcher.write0(Native Method)
        at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:51)
        at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
        at sun.nio.ch.IOUtil.write(IOUtil.java:65)
        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471)
        at 
org.eclipse.jetty.io.nio.ChannelEndPoint.flush(ChannelEndPoint.java:293)
        at 
org.eclipse.jetty.io.nio.SelectChannelEndPoint.flush(SelectChannelEndPoint.java:404)
        at 
org.eclipse.jetty.io.nio.ChannelEndPoint.flush(ChannelEndPoint.java:341)
        at 
org.eclipse.jetty.io.nio.SelectChannelEndPoint.flush(SelectChannelEndPoint.java:378)
        at 
org.eclipse.jetty.http.HttpGenerator.flushBuffer(HttpGenerator.java:841)
        ... 31 more
{noformat}

However, if I do something like this:
        for (int i = 0; i < 20; i++) {
            try (InputStream is = TikaInputStream.get(new 
File("C:/data/test_in/docx/Document (1) - Copy.docx"))) {
                Response response = WebClient.create(endPoint + TIKA_PATH)
                        //.type("application/rtf")
                        .accept("text/plain")
                        .put(is);

                Path outFile = Paths.get("C:/data/test_out/out_"+i+".txt");
                if (Files.isRegularFile(outFile)) {
                    Files.delete(outFile);
                }
                Files.copy((InputStream)response.getEntity(), outFile);
                System.out.println("RESULT: " + response.getStatus());
                if (response.getStatus() != 200) {
                    ex++;
                }
            }
            fileCount++;
        }

I don't get any exceptions.

I'm not sure if it is the delay that copying the bytes out is preventing the 
exceptions or if reading the content is necessary to prevent the exceptions.


Also, my memory usage with the new parser never goes above 300MB.  Are you sure 
you're using the new parser?

> Multiple requests on Tika to extract text slows down
> ----------------------------------------------------
>
>                 Key: TIKA-2180
>                 URL: https://issues.apache.org/jira/browse/TIKA-2180
>             Project: Tika
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 1.13, 1.14
>         Environment: Windows OS, Open JDK, 4 core 32 GB RAM
>            Reporter: Ashish Basran
>         Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png, 
> with new experimental SAX docx parser.png
>
>
> I observed that if I send multiple requests to Tika (eg. 
> http://localhost:8080/tika) with around 5MB files, Tika is very slow in 
> completing the action. I tried with ~20 random files, it took 170 seconds to 
> process all the files in sequence. If I pass all files in parallel, it took 
> around 780 seconds to process same set of files. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to