On May 7, 2007, at 9:07 AM, Dennis Kubes wrote:
Brian Whitman wrote:
Hi all,
I looked into this a bit more after it crashed for the third time in a row. every time it has segfaulted it's had this url as one of the past few fetches: fetching http://www.c bs.nu/cgi-bin/ac/adcycle.cgi? gid=4&layout=multi&id=125 Note the space in there. This URL is not in my initial fetchlist so it was found somewhere. Not sure if the space is actually a space or an encoding -> terminal issue, either way I think this has something to do with it. Does anyone know what happens when java/nutch gets a hostname that is obviously malformed?

I believe is should throw a malformed url exception.

OK. I got the crash again today on different urls. It's strange because I've been crawling quite regularly with the same nutch setup for a while. It's possible that a recent system-level change is getting in the way (I'm running debian with a recent full upgrade.)

After googling the culprit for a while I found this trick:

-Djava.net.preferIPv4Stack=true

I'm running a large crawl with it now and will let you know if I don't see it in a while!

-Brian


Reply via email to