thank you for your hints but I didn' give you the following information:

I modified the file crawl-urlfilter.txt in this mode:
#start crawl-urlfilter
# skip file:, ftp:, & mailto: urls

# skip image and other suffixes we can't yet parse

# skip URLs containing certain characters as probable queries, etc.

# accept anything else
#end crawl-urlfilter

I started nutch with this line_command :
bin/nutch crawl urls -dir /home/paul/nutch-searcher.dir -depth 3 >& crawl.log

In the file "urls" there is the url of the following page:


<TITLE>  TitleOfSite </TITLE>

<FRAMESET ROWS="14%, *">


<FRAME NAME="PAGE"  SRC="../welcome.html" SCROLLING=AUTO">



Nutch crawls and fetchs "welcome.html"  but doesn't work with MyServlet?menu=1
The servlet "MyServlet?menu=1"  shows some links but in the log  nutch doesn't 
fetch  any of those links.
I hope the question is clear and am looking forward to receiving your answer.


Visita http://domini.interfree.it, il sito di Interfree dove trovare
soluzioni semplici e complete che soddisfano le tue esigenze in Internet,
ecco due esempi di offerte:

-  Registrazione Dominio: un dominio con 1 MB di spazio disco +  2 caselle
   email a soli 18,59 euro
-  MioDominio: un dominio con 20 MB di spazio disco + 5 caselle email 
   a soli 51,13 euro

Vieni a trovarci!

Lo Staff di Interfree 

Reply via email to