Hi Jaydeep you can following command to get statistics for each host when using one database to crawl multiple repository.
bin/nutch readdb crawldb/crawldb/ -stats -sort On Mon, Feb 9, 2015 at 12:01 PM, Jaydeep Bagrecha <[email protected]> wrote: > Thanks. > > *P.S* > The question was:- > *Given M (repo)repositories(M corresponding seedlist urls),find crawl > statistics(number of fetched/unfetched urls,etc)for each repo separately?* > > So,Is there a way to crawl all M repo together(include eg:-domain name of > all m in regex-urlfilter.txt file) and get statistics for each one > individually. > > OR > > > Do we have to crawl each repo separately(include domain name of only 1 > repo in regex-urlfilter.txt)and get its statistics from corresponding > crawldb? > > > > > > Thanks, > Jaydeep Bagrecha > > > > On Feb 8, 2015, at 6:24 PM, Mattmann, Chris A (3980) < > [email protected]> wrote: > > Hi Jaydeep, > > Please qualify what this question is about - I know what it’s > about but you have provided very little detail for anyone else > on this to list to discern it. > > The short answer is no: crawldb stats are per crawl. > > Cheers, > Chris > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Chris Mattmann, Ph.D. > Chief Architect > Instrument Software and Science Data Systems Section (398) > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > Office: 168-519, Mailstop: 168-527 > Email: [email protected] > WWW: http://sunset.usc.edu/~mattmann/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Adjunct Associate Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > > > -----Original Message----- > From: Jaydeep Bagrecha <[email protected]> > Reply-To: "[email protected]" <[email protected]> > Date: Sunday, February 8, 2015 at 2:22 PM > To: "[email protected]" <[email protected]> > Subject: 572:Crawl statistics for each repository ? > > > Is there a way to crawl all 3 repositories together and get statistics > > for each one individually? > > > OR > > > Do we have to crawl each repository separately and get its statistics > > from corresponding crawldb? > > > Thanks, > > Jaydeep > > > > > -- Don't Grow Old, Grow Up... :-)

