Please reply to me directly as well, as I'm not on the nutch-dev list regularly.

I'm curious ... Googlebot, Yahoo Slurp, and now CazoodleBot (based on Nutch) are hitting our site at http://www.nines.org and I get all sorts of invalid links crawled. Is our site doing something wrong in our markup? Or are all these crawlers flawed by hitting non-sensible URLs?

Thanks,
        Erik


Begin forwarded message:

From: Application Error <[EMAIL PROTECTED]>
Date: July 10, 2007 6:31:45 AM EDT
To: [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED]
Subject: [Collex] application#index (ActionController::RoutingError) "no route found to match \"/nines/ escape(document.title) u,\" with {:method=>:get}"

A ActionController::RoutingError occurred in application#index:

no route found to match "/nines/ escape(document.title) u," with {:method=>:get} [RAILS_ROOT]/vendor/rails/actionpack/lib/action_controller/ routing.rb:1292:in `recognize_path'

-------------------------------
Request:
-------------------------------

  * URL: http://www.nines.org/nines/+escape(document.title)+u,
  * Parameters: {}
  * Rails root: /usr/local/patacriticism/production-web/current

-------------------------------
Session:
-------------------------------

  * @data: {"flash"=>{}}
  * @new_session: true
  * @write_lock: true
  * @session_id: "7faa9cc7b50752112ed9a1b8aab220ec"

-------------------------------
Environment:
-------------------------------

* DOCUMENT_ROOT : /usr/local/patacriticism/production-web/ current/public
  * FCGI_ROLE           : RESPONDER
  * GATEWAY_INTERFACE   : CGI/1.1
  * HTTP_ACCEPT_ENCODING: x-gzip, gzip
  * HTTP_HOST           : www.nines.org
* HTTP_USER_AGENT : CazoodleBot/Nutch-0.9-dev (CazoodleBot Crawler; http://www.cazoodle.com/cazoodlebot; [EMAIL PROTECTED]) * PATH : /usr/jdk1.5.0_06/bin:/usr/local/nines/ bin:/opt/csw/bin:/usr/local/bin:/uva/bin:/usr/bin:/opt/SUNWspro/ bin:/usr/ucb:/usr/openwin/bin:/usr/dt/bin:/usr/ccs/bin:/gnu/bin:/ contrib/bin:/home/loadl/bin:/usr/local/ant/bin
  * QUERY_STRING        :
  * REDIRECT_STATUS     : 200
  * REDIRECT_UNIQUE_ID  : RpNgEYCPFU0AACruKkQ
  * REDIRECT_URL        : /nines/+escape(document.title)+u,
  * REMOTE_ADDR         : 72.36.94.103
  * REMOTE_PORT         : 58936
  * REQUEST_METHOD      : GET
  * REQUEST_URI         : /nines/+escape(document.title)+u,
* SCRIPT_FILENAME : /usr/local/patacriticism/production-web/ current/public/dispatch.fcgi
  * SCRIPT_NAME         : /dispatch.fcgi
  * SERVER_ADDR         : 128.143.21.77
  * SERVER_ADMIN        : [EMAIL PROTECTED]
  * SERVER_NAME         : nines.org
  * SERVER_PORT         : 80
  * SERVER_PROTOCOL     : HTTP/1.0
* SERVER_SIGNATURE : <ADDRESS>Apache/1.3.36 Server at nines.org Port 80</ADDRESS> * SERVER_SOFTWARE : Apache/1.3.36 (Unix) mod_fastcgi/2.4.2 PHP/4.3.3 mod_perl/1.29
  * TZ                  : US/Eastern
  * UNIQUE_ID           : RpNgEYCPFU0AACruKkQ

  * Process: 10999
  * Server :

-------------------------------
Backtrace:
-------------------------------

[RAILS_ROOT]/vendor/rails/actionpack/lib/action_controller/ routing.rb:1292:in `recognize_path' [RAILS_ROOT]/vendor/rails/actionpack/lib/action_controller/ routing.rb:1282:in `recognize' [RAILS_ROOT]/vendor/rails/railties/lib/dispatcher.rb:40:in `dispatch' [RAILS_ROOT]/vendor/rails/railties/lib/fcgi_handler.rb:168:in `process_request' [RAILS_ROOT]/vendor/rails/railties/lib/fcgi_handler.rb:143:in `process_each_request!' [RAILS_ROOT]/vendor/rails/railties/lib/fcgi_handler.rb:109:in `with_signal_handler' [RAILS_ROOT]/vendor/rails/railties/lib/fcgi_handler.rb:142:in `process_each_request!' /opt/csw/lib/ruby/gems/1.8/gems/fcgi-0.8.7/lib/fcgi.rb:612:in `each_cgi'
  /opt/csw/lib/ruby/gems/1.8/gems/fcgi-0.8.7/lib/fcgi.rb:609:in `each'
/opt/csw/lib/ruby/gems/1.8/gems/fcgi-0.8.7/lib/fcgi.rb:609:in `each_cgi' [RAILS_ROOT]/vendor/rails/railties/lib/fcgi_handler.rb:141:in `process_each_request!' [RAILS_ROOT]/vendor/rails/railties/lib/fcgi_handler.rb:55:in `process!' [RAILS_ROOT]/vendor/rails/railties/lib/fcgi_handler.rb:25:in `process!'
  [RAILS_ROOT]/public/dispatch.fcgi:24


Reply via email to