small files?

Zachary Kozick Tue, 01 Feb 2011 13:21:54 -0800

Hi all,

I'm interested in creating a solution that leverages multiple computing
nodes in an EC2 or Rackspace cloud environment in order to
do massively parallelized processing in the context of serving HTTP
requests, meaning I want results to be aggregated within 1-4 seconds.


>From what I gather, Hadoop is designed for job-oriented tasks and the
minimum job completion time is 30 seconds.  Also HDFS is meant for storing
few large files, as opposed to many small files.

My question is there a framework similar to hadoop that is designed more for
on-demand parallel computing?  What about a technology similar to HDFS that
is better at moving around small files and making them available to slave
nodes on demand?

Hadoop / HDFS equalivant but for realtime request handling / small files?

Reply via email to