On Mar 3, 2011, at 1:23 AM, [email protected] wrote:

> How applicable would Hadoop be to the processing of thousands of large 
> (60-100MB) 3D image files accessible via NFS, using a 100+ machine cluster?
> 
> Does the idea have any merit at all?
> 

It may be a good idea.  If you think the above is a viable architecture for 
data processing, then you likely don't "need" Hadoop because your problem is 
small enough, or you spent way too much money on your NFS server.

Whether or not you "need" Hadoop for data scalability - petabytes of data moved 
at gigabytes a second - is a small aspect of the question.

Hadoop is a good data processing platform in its own right.  Traditional batch 
systems tend to have very Unix-friendly APIs for data processing (you'll find 
yourself writing perl script that create text submit files, shell scripts, and 
C code), but appear clumsy to "modern developers" (this is speaking as someone 
who lives and breathes batch systems).  Hadoop has "nice" Java APIs and is Java 
developer friendly, has a lot of data processing concepts built in compared to 
batch systems, and extends OK to other langauges.

If you write your image processing in Java, it would be silly to not consider 
Hadoop.  If you currently run a bag full of shell scripts and C++ code, it's a 
tougher decision to make.

Brian

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to