Hi Erlend, The "Interrupted: null" message with a -104 code means only that the fetch was interrupted by something. Unfortunately, the message is not clear about what the cause of the interruption is. This is unrelated to Zookeeper; but I agree that it is suspicious that many such interruptions appear right after robots is parsed.
One cause of a -104 is when the target server forcibly drops the connection, so an InterruptedIOException is thrown. Having a look at the timestamps for the fetch messages, it looks believable that you might have exceeded some predetermined limit on that machine. They're all within a few milliseconds of each other. When a robots file needs to be read, ManifoldCF creates an event for that, and the urls blocked by that event will all be 'fetchable' as soon as the event is released. Perhaps your throttling needs to be adjusted now that the rate limit bug has been fixed? I won't be able to work with this without at least your crawling parameters for the server in question. I can ping that server so if you would like I can try crawling that server from here. For zookeeper, I would still try to either increase your tick count to maybe 10000, or better yet, find out why you periodically lose the ability to transmit pings from MCF to your zookeeper process. Thanks, Karl On Thu, Sep 18, 2014 at 7:15 AM, Erlend Garåsen <e.f.gara...@usit.uio.no> wrote: > On 18.09.14 13:00, Karl Wright wrote: > >> Hi Erlend, >> >> please can you also add the manifoldcf log as well? >> > > Yes, I will, but it includes entries from RC0 as well. > > MCF works perfectly using the other jobs for the other hosts. Take a look > at the following once again. MCF is being interrupted: > INFO 2014-09-18 11:13:42,824 (Worker thread '19') - WEB: FETCH URL| > https://www.duo.uio.no/|1411030940209+682605|-104| > 4096|org.apache.manifoldcf.core.interfaces.ManifoldCFException| > <https://www.duo.uio.no/%7C1411030940209+682605%7C-104%7C4096%7Corg.apache.manifoldcf.core.interfaces.ManifoldCFException%7C> > Interrupted: Interrupted: null > > You can find this entry near the other regarding the robots.txt file: > http://folk.uio.no/erlendfg/manifoldcf/manifoldcf.log > > Erlend > >