rather than protocol-httpclient, but
it didn't make any difference.
Thanks,
Florent
Doug Cutting wrote:
Florent Gluck wrote:
When I inject 25000 urls and fetch them (depth = 1) and do a readdb
-stats, I get:
060110 171347 Statistics for CrawlDb: crawldb
060110 171347 TOTAL urls: 27939
Totally agree, +1
Thanks for the help :)
--Flo
Andrzej Bialecki wrote:
Hi,
During the past year and more Stefan participated actively in the
development, and contributed many high-quality patches. He's been
spending considerable effort on addressing many issues in JIRA, and
proposing
I hope it's not too late to accept my votes. Here there are:
NUTCH-136mapreduce segment generator generates 50 % less than
excepted urls
+1
NUTCH-121SegmentReader for mapred
+1
NUTCH-108tasktracker crashs when reconnecting to a new jobtracker.
+1
Thanks,
--Flo
When doing a one-pass crawl, I noticed that when I inject more than
~16000 urls, the fetcher only fetches a subset of the set initially
injected.
I use 1 master and 3 slaves with the following properties:
mapred.map.tasks = 30
mapred.reduce.tasks = 6
generate.max.per.host = -1
I tried to inject
AWESOME !! =:)
Stefan Groschupf wrote:
´So, with your patch, did you see 100% of urls *attempting* a fetch ?
100% ! :-)