Re: weird fetcher behavior

2006-01-11 Thread Florent Gluck
rather than protocol-httpclient, but it didn't make any difference. Thanks, Florent Doug Cutting wrote: Florent Gluck wrote: When I inject 25000 urls and fetch them (depth = 1) and do a readdb -stats, I get: 060110 171347 Statistics for CrawlDb: crawldb 060110 171347 TOTAL urls: 27939

Re: [VOTE] Commiter access for Stefan Groschupf

2005-12-16 Thread Florent Gluck
Totally agree, +1 Thanks for the help :) --Flo Andrzej Bialecki wrote: Hi, During the past year and more Stefan participated actively in the development, and contributed many high-quality patches. He's been spending considerable effort on addressing many issues in JIRA, and proposing

Re: vote for issues to fix in 0.7.2

2005-12-15 Thread Florent Gluck
I hope it's not too late to accept my votes. Here there are: NUTCH-136mapreduce segment generator generates 50 % less than excepted urls +1 NUTCH-121SegmentReader for mapred +1 NUTCH-108tasktracker crashs when reconnecting to a new jobtracker. +1 Thanks, --Flo

mapreduce fetcher doesn't fetch all urls

2005-12-14 Thread Florent Gluck
When doing a one-pass crawl, I noticed that when I inject more than ~16000 urls, the fetcher only fetches a subset of the set initially injected. I use 1 master and 3 slaves with the following properties: mapred.map.tasks = 30 mapred.reduce.tasks = 6 generate.max.per.host = -1 I tried to inject

Re: mapreduce fetcher doesn't fetch all urls

2005-12-14 Thread Florent Gluck
AWESOME !! =:) Stefan Groschupf wrote: ´So, with your patch, did you see 100% of urls *attempting* a fetch ? 100% ! :-)