Sorry if being off topic, but we experienced a very low bandwidth with hadoop while copying files to/from the cluster (some 1/100 comparing to plain samba share). The bandwidth did not improve at all by adding nodes to the cluster. At that time I thought that hadoop is not supposed to be used for this purpose and did not use it for my project. I am just curious how much scalable hadoop is and how bandwidth should grow as nodes are added to the cluster. --jaf
On 4/16/07, Doug Cutting <[EMAIL PROTECTED]> wrote:
Eelco Lempsink wrote: > Inspired by > http://www.mail-archive.com/[EMAIL PROTECTED]/msg02394.html > I'm trying to run Hadoop on multiple CPU's, but without using HDFS. To be clear: you need some sort of shared filesystem, if not HDFS, then NFS, S3, or something else. For example, the job client interacts with the job tracker by copying files to the shared filesystem named by fs.default.name, and job inputs and outputs are assumed to come from a shared filesystem. So, if you're using NFS, then you'd set fs.default.name to something like "file:///mnt/shared/hadoop/". Note also that as your cluster grows, NFS will soon become a bottleneck. That's why HDFS is provided: there aren't other readily available shared filesystems that scale appropriately. Doug
