Has anyone ever used AmazonEC2 to do lots of spidering concurrently? And what about Amazon S3 (Simple Storage Service) ?

d e Sun, 11 Mar 2007 06:50:06 -0800

It would seem that one could use
http://wiki.apache.org/lucene-hadoop/AmazonEC2 to run many hours of
spidering in a single hour by having a bunch of xen virtual machine
instances set to do this. If one started up and shut down instances for the
crawl, it would not seem to be expensive to do a lot of work concurrently on
a big crawl. Is this practical? Why or why not?


Having a virtual server farm that apears just when I need it and only costs
me 10 cents / machine per hour and zero when I'm done with my spidering
sounds like something I should explore apart from just the coolness of it.

What about actually running the web front end search site on EC2? Would that
be wise? Will I get the performance that I need to be seen as a responsive
website?

When they talk about Amazon S3 (Simple Storage Service), is this raw disk
space that they are selling or backed up, guaranteed to be there storage?

Does anyone on the list of actual experience with either of these offerings
or any competitive offerings from anyone else?

I assume that Amazon has a huge infrastructure with great connectivity. Do I
get the benefit of all that if I use EC2 really, in practical reality? Will
it actually help me keep my search index up to date?

Thanks!

Has anyone ever used AmazonEC2 to do lots of spidering concurrently? And what about Amazon S3 (Simple Storage Service) ?

Reply via email to