Hadoop works fine on the local file system. The example apps don't even bother copying things into hdfs first. But the problem, as Ted mentioned, with working with huge numbers of small files on the filesystem is IO speed. Hard drives just aren't that fast no matter how much you spend.
I would bet that hdfs is going to chunk those files up and parcel them out to the processes in a much more efficient manner than reading them directly from the local filesystem would provide. I don't have any numbers to back this up but as the local filesystem ISN'T tuned for this usage and hdfs IS it seems reasonable to assume better performance from it. -- - kate = masukomi http://weblog.masukomi.org/
