Our web server has been receiving a lot of failing traffic from shopping.com and irl.cs.tamu.edu
I believe your crawler is seeing "§ion" and replacing it with "§ion" http://www.businessair.com/avdealers.cfm?alpha_choice=ALL§ion=AC&first_sort_ by_column=DEALRSTATE&sort_by_columns=DEALRSTATEDESC,DEALRNAME,CITY,DESCRIPTI ON The URL should be... http://www.businessair.com/avdealers.cfm?alpha_choice=ALL§ion=AC&first_s ort_by_column=DEALRSTATE&sort_by_columns=DEALRSTATEDESC,DEALRNAME,CITY,DESCR IPTION This would only be a minor problem, except that your bot is sending several requests while only waiting a second between requests. A typical user can only click on the page a few seconds after the request has been fulfilled. Therefore, a request should only be made every 15-20 seconds at the most. It doesn't look like your bot even waited for the page to finish loading. Otherwise, a system admin could see the above actions as a Denial Of Service attack. As far as the "§ion" being replaced with "§ion"... Under the file ... nutch\html\Entities.java there is an area adding special characters. However, I believe that those special characters are supposed to start with & and end in ; (ie: § or  ). I have not recompiled the code, yet, but I believe that this should remedy the problem. Please keep me informed to your progress, or I will be forced to block your bots (which I would prefer not to do). Thanks. Sincerely, Fred ><><><><><><><><><><><><><><><><><>< Fred Tyre Information Services Heartland Communications, Inc. 515-574-2147 [EMAIL PROTECTED] ><><><><><><><><><><><><><><><><><><