RE: CrawlTool - fetching only first page

cn Thu, 11 Aug 2005 09:34:45 -0700

1. With a depth = 1 that means that it will only crawl the urls in url.txt

2. if some url in url.txt are not fetched, check that your url syntax is
correct


3. Check your regex-urlfilter file and set the right regular expression

Christophe Noel

Quoting Fuad Efendi <[EMAIL PROTECTED]>:

> I loaded latest code, created nutch-0.7-dev, and run command
> bin/nutch crawl url.txt -dir test.crawl -depth 1
> 
> Still does not work. It works in nutch-0.6, with same depth and url.txt,
> it fetches about 30 files.
> 
> 
> 
> -----Original Message-----
> From: Fuad Efendi [mailto:[EMAIL PROTECTED] 
> Sent: Thursday, August 11, 2005 11:29 AM
> To: [email protected]
> Subject: RE: CrawlTool - fetching only first page
> 
> 
> Yes, I defined depth 5 (I noticed, it creates 5 segments)
> It fetches only main URLs without linked pages
> 
> 
> -----Original Message-----
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
> Sent: Thursday, August 11, 2005 11:20 AM
> To: [email protected]
> Subject: Aw: CrawlTool - fetching only first page
> 
> 
>  Did you define a depth?
> What is your exact command? 
> 
> should be something like
> 
> ./nuch crawl urls -dir crawldir -threads 1 -depth 3
> 
> Nils 
> 
> ----- Original Nachricht ----
> Von:     Fuad Efendi <[EMAIL PROTECTED]>
> An:      [email protected]
> Datum:   11.08.2005 17:16
> Betreff: CrawlTool - fetching only first page
> 
> > I configured classpath including \conf\ and \build\ (which contains
> > plugins) folders, and run CrawlTool without any errors, but it fetches
> 
> > only first page and does not fetch lined pages. Windows XP.
> > 
> > What is missed?
> > 
> > 
> 
> Machen Sie aus 14 Cent spielend bis zu 100 Euro!
> Die neue Gaming-Area von Arcor - über 50 Onlinespiele im Angebot.
> http://www.arcor.de/rd/emf-gaming-1
> 
> 
> 
> 




----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.

RE: CrawlTool - fetching only first page

Reply via email to