brainstorm
Thu, 03 Jul 2008 11:01:30 -0700
Regarding real world nutch clusters (>10 nodes) what's the approach you follow to maximise fetches throughput ? For instance, my guess is that the "classical" number-crunching (HPC) scientific network cluster topology (intra-cluster private network plus 1 head node with "outside world" connection), it's suboptimal in a nutch deployment: network bottleneck in head node while crawling inet. So what do you suggest in that matter ? Thanks in advance !