Please reply to me directly as well, as I'm not on the nutch-dev list
regularly.
I'm curious ... Googlebot, Yahoo Slurp, and now CazoodleBot (based on
Nutch) are hitting our site at http://www.nines.org and I get all
sorts of invalid links crawled. Is our site doing something wrong in
our markup? Or are all these crawlers flawed by hitting non-sensible
URLs?
Thanks,
Erik
Begin forwarded message:
From: Application Error <[EMAIL PROTECTED]>
Date: July 10, 2007 6:31:45 AM EDT
To: [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED]
Subject: [Collex] application#index
(ActionController::RoutingError) "no route found to match \"/nines/
escape(document.title) u,\" with {:method=>:get}"
A ActionController::RoutingError occurred in application#index:
no route found to match "/nines/ escape(document.title) u," with
{:method=>:get}
[RAILS_ROOT]/vendor/rails/actionpack/lib/action_controller/
routing.rb:1292:in `recognize_path'
-------------------------------
Request:
-------------------------------
* URL: http://www.nines.org/nines/+escape(document.title)+u,
* Parameters: {}
* Rails root: /usr/local/patacriticism/production-web/current
-------------------------------
Session:
-------------------------------
* @data: {"flash"=>{}}
* @new_session: true
* @write_lock: true
* @session_id: "7faa9cc7b50752112ed9a1b8aab220ec"
-------------------------------
Environment:
-------------------------------
* DOCUMENT_ROOT : /usr/local/patacriticism/production-web/
current/public
* FCGI_ROLE : RESPONDER
* GATEWAY_INTERFACE : CGI/1.1
* HTTP_ACCEPT_ENCODING: x-gzip, gzip
* HTTP_HOST : www.nines.org
* HTTP_USER_AGENT : CazoodleBot/Nutch-0.9-dev (CazoodleBot
Crawler; http://www.cazoodle.com/cazoodlebot;
[EMAIL PROTECTED])
* PATH : /usr/jdk1.5.0_06/bin:/usr/local/nines/
bin:/opt/csw/bin:/usr/local/bin:/uva/bin:/usr/bin:/opt/SUNWspro/
bin:/usr/ucb:/usr/openwin/bin:/usr/dt/bin:/usr/ccs/bin:/gnu/bin:/
contrib/bin:/home/loadl/bin:/usr/local/ant/bin
* QUERY_STRING :
* REDIRECT_STATUS : 200
* REDIRECT_UNIQUE_ID : RpNgEYCPFU0AACruKkQ
* REDIRECT_URL : /nines/+escape(document.title)+u,
* REMOTE_ADDR : 72.36.94.103
* REMOTE_PORT : 58936
* REQUEST_METHOD : GET
* REQUEST_URI : /nines/+escape(document.title)+u,
* SCRIPT_FILENAME : /usr/local/patacriticism/production-web/
current/public/dispatch.fcgi
* SCRIPT_NAME : /dispatch.fcgi
* SERVER_ADDR : 128.143.21.77
* SERVER_ADMIN : [EMAIL PROTECTED]
* SERVER_NAME : nines.org
* SERVER_PORT : 80
* SERVER_PROTOCOL : HTTP/1.0
* SERVER_SIGNATURE : <ADDRESS>Apache/1.3.36 Server at
nines.org Port 80</ADDRESS>
* SERVER_SOFTWARE : Apache/1.3.36 (Unix) mod_fastcgi/2.4.2
PHP/4.3.3 mod_perl/1.29
* TZ : US/Eastern
* UNIQUE_ID : RpNgEYCPFU0AACruKkQ
* Process: 10999
* Server :
-------------------------------
Backtrace:
-------------------------------
[RAILS_ROOT]/vendor/rails/actionpack/lib/action_controller/
routing.rb:1292:in `recognize_path'
[RAILS_ROOT]/vendor/rails/actionpack/lib/action_controller/
routing.rb:1282:in `recognize'
[RAILS_ROOT]/vendor/rails/railties/lib/dispatcher.rb:40:in
`dispatch'
[RAILS_ROOT]/vendor/rails/railties/lib/fcgi_handler.rb:168:in
`process_request'
[RAILS_ROOT]/vendor/rails/railties/lib/fcgi_handler.rb:143:in
`process_each_request!'
[RAILS_ROOT]/vendor/rails/railties/lib/fcgi_handler.rb:109:in
`with_signal_handler'
[RAILS_ROOT]/vendor/rails/railties/lib/fcgi_handler.rb:142:in
`process_each_request!'
/opt/csw/lib/ruby/gems/1.8/gems/fcgi-0.8.7/lib/fcgi.rb:612:in
`each_cgi'
/opt/csw/lib/ruby/gems/1.8/gems/fcgi-0.8.7/lib/fcgi.rb:609:in `each'
/opt/csw/lib/ruby/gems/1.8/gems/fcgi-0.8.7/lib/fcgi.rb:609:in
`each_cgi'
[RAILS_ROOT]/vendor/rails/railties/lib/fcgi_handler.rb:141:in
`process_each_request!'
[RAILS_ROOT]/vendor/rails/railties/lib/fcgi_handler.rb:55:in
`process!'
[RAILS_ROOT]/vendor/rails/railties/lib/fcgi_handler.rb:25:in
`process!'
[RAILS_ROOT]/public/dispatch.fcgi:24