Hi, thank you for your hints but I didn' give you the following information:
I modified the file crawl-urlfilter.txt in this mode: #start crawl-urlfilter # skip file:, ftp:, & mailto: urls -^(file|ftp|mailto): # skip image and other suffixes we can't yet parse -\.(gif|GIF|jpg|JPG|ico|ICO|css|sit|eps|wmf|rtf|zip|ppt|mpg|xls|gz|rpm|tgz|mov|MOV|exe)$ # skip URLs containing certain characters as probable queries, etc. [EMAIL PROTECTED] # accept anything else +. #end crawl-urlfilter I started nutch with this line_command : bin/nutch crawl urls -dir /home/paul/nutch-searcher.dir -depth 3 >& crawl.log In the file "urls" there is the url of the following page: <HTML> <HEAD> <TITLE> TitleOfSite </TITLE> </HEAD> <FRAMESET ROWS="14%, *"> <FRAME NORESIZE NAME="MENU" SRC="MyServlet?menu=1" SCROLLING =AUTO"> <FRAME NAME="PAGE" SRC="../welcome.html" SCROLLING=AUTO"> </FRAMESET> </HTML> Nutch crawls and fetchs "welcome.html" but doesn't work with MyServlet?menu=1 The servlet "MyServlet?menu=1" shows some links but in the log nutch doesn't fetch any of those links. I hope the question is clear and am looking forward to receiving your answer. Adriano ------------------------------------------------------------------------- Visita http://domini.interfree.it, il sito di Interfree dove trovare soluzioni semplici e complete che soddisfano le tue esigenze in Internet, ecco due esempi di offerte: - Registrazione Dominio: un dominio con 1 MB di spazio disco + 2 caselle email a soli 18,59 euro - MioDominio: un dominio con 20 MB di spazio disco + 5 caselle email a soli 51,13 euro Vieni a trovarci! Lo Staff di Interfree -------------------------------------------------------------------------