Hi all, I'm interested in creating a solution that leverages multiple computing nodes in an EC2 or Rackspace cloud environment in order to do massively parallelized processing in the context of serving HTTP requests, meaning I want results to be aggregated within 1-4 seconds.
>From what I gather, Hadoop is designed for job-oriented tasks and the minimum job completion time is 30 seconds. Also HDFS is meant for storing few large files, as opposed to many small files. My question is there a framework similar to hadoop that is designed more for on-demand parallel computing? What about a technology similar to HDFS that is better at moving around small files and making them available to slave nodes on demand?