I was able to reproduce the exception here using your URL. It is indeed a bug in how it handles the 500 error. I've checked in a fix, and will be spinning a new RC just as soon as we resolve the Maven issue. That turns out to be much thornier - if you'd run -DskipITs or -DskipTests instead it would work.
Karl On Fri, Sep 28, 2012 at 7:31 AM, Erlend Garåsen <[email protected]> wrote: > > OK, I will give you a stack trace in the beginning of next week. > > I will start the crawler once more and check the results when I'm back and > change my vote then if it is ok. > > Erlend > > > On 28.09.12 13.26, Karl Wright wrote: >> >> "Meanwhile, the following is filling up my log: >> FATAL 2012-09-28 11:42:32,112 (Worker thread '29') - Error tossed: >> String index out of range: -1 >> java.lang.StringIndexOutOfBoundsException: String index out of range: -1" >> >> This is indeed a problem I agree we should fix, but in order to do >> that I need a stack trace. It is not clear at all that it is related >> to the 500 error you described before, but it could be. I will create >> a ticket for it though. >> Karl >> >> On Fri, Sep 28, 2012 at 5:49 AM, Erlend Garåsen <[email protected]> >> wrote: >>> >>> >>> I'm trying to start a crawl before I have to run to the airport. I just >>> discovered that MCF recrawls the same host over and over again when it >>> returns result code 500: >>> 09-28-2012 11:40:11.024 fetch >>> http://foreninger.uio.no/go/oslo_open_2012_no.php >>> 500 >>> >>> It's just not this document, but several others returning the same HTTP >>> result code. >>> >>> Meanwhile, the following is filling up my log: >>> FATAL 2012-09-28 11:42:32,112 (Worker thread '29') - Error tossed: String >>> index out of range: -1 >>> java.lang.StringIndexOutOfBoundsException: String index out of range: -1 >>> >>> I'm pretty sure they are related to each other. >>> >>> I will end this job before I leave because I'm afraid that MCF will try >>> to >>> fetch these documents over and over again during this weekend. >>> >>> Erlend >>> >>> >>> On 28.09.12 09.58, Karl Wright wrote: >>>> >>>> >>>> Please vote +1 to release ManifoldCF 1.0, RC5. The release artifact >>>> can be found at: >>>> >>>> http://people.apache.org/~kwright/apache-manifoldcf-1.0 >>>> >>>> There is also an SVN tag at: >>>> >>>> https://svn.apache.org/repos/asf/manifoldcf/tags/release-1.0-RC5 >>>> >>>> Fixes since RC4: >>>> >>>> CONNECTORS-545 >>>> >>>> Fixes since RC3: >>>> >>>> CONNECTORS-544 >>>> >>> >>> >>> -- >>> Erlend Garåsen >>> Center for Information Technology Services >>> University of Oslo >>> P.O. Box 1086 Blindern, N-0317 OSLO, Norway >>> Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: >>> 31050 > > > > -- > Erlend Garåsen > Center for Information Technology Services > University of Oslo > P.O. Box 1086 Blindern, N-0317 OSLO, Norway > Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050
