The general problem is urls like: http://www.agriturismo.pg.it/storia-citta-umbria/index.html
a custom not found pages that generate infinite crawler loop on site.
You're referring to error pages that do not return 404?
In another thread I just suggested a way to handle these:
http://www.mail-archive.com/nutch-user%40incubator.apache.org/msg00286.html
The url you mention is amenable to this solution. It's title contains the string "pagina di errore", but it does not return a 404.
Doug
