Hello,

is there a reason why nuth based crawlers do post forms while traversing
links?


turingc.cs.washington.edu - - [14/Sep/2005:22:09:57 +0200] "GET
/lina/cgi-bin/freefire-mail.cgi HTTP/1.0" 302 230 "-" "NutchCVS/0.8-dev
(Nutch; http://lucene.apache.org/nutch/bot.html;
nutch-agent@lucene.apache.org)"
turingc.cs.washington.edu - - [14/Sep/2005:22:10:04 +0200] "GET
/lina/freefire-l/index.en.html HTTP/1.0" 302 204 "-" "NutchCVS/0.8-dev
(Nutch; http://lucene.apache.org/nutch/bot.html;
nutch-agent@lucene.apache.org)"

above logs is www.freefire.org which contains a form:

<form method="POST" 
ACTION="http://sites.inka.de/lina/cgi-bin/freefire-mail.cgi";>

which leads to this:

turingc.cs.washington.edu - - [14/Sep/2005:22:09:57 +0200] "GET
/lina/cgi-bin/freefire-mail.cgi HTTP/1.0" 302 230 "-" "NutchCVS/0.8-dev
(Nutch; http://lucene.apache.org/nutch/bot.html;
nutch-agent@lucene.apache.org)"
turingc.cs.washington.edu - - [14/Sep/2005:22:10:04 +0200] "GET
/lina/freefire-l/index.en.html HTTP/1.0" 302 204 "-" "NutchCVS/0.8-dev
(Nutch; http://lucene.apache.org/nutch/bot.html;
nutch-agent@lucene.apache.org)"

So it is getting the POST URL instead of ignoring the form?

Gruss
Bernd
-- 
  (OO)     -- [EMAIL PROTECTED] --
 ( .. )    [EMAIL PROTECTED],linux.de,debian.org}  http://www.eckes.org/
  o--o   1024D/E383CD7E  [EMAIL PROTECTED]  v:+497211603874  f:+49721151516129
(O____O)  When cryptography is outlawed, bayl bhgynjf jvyy unir cevinpl!

Reply via email to