Sorry if being off topic, but we experienced a very low bandwidth with
hadoop while copying files to/from the cluster (some 1/100 comparing to
plain samba share). The bandwidth did not improve at all by adding nodes to
the cluster. At that time I thought that hadoop is not supposed to be used
for this purpose and did not use it for my project.
I am just curious how much scalable hadoop is and how bandwidth should grow
as nodes are added to the cluster.
--jaf

On 4/16/07, Doug Cutting <[EMAIL PROTECTED]> wrote:

Eelco Lempsink wrote:
> Inspired by
> http://www.mail-archive.com/[EMAIL PROTECTED]/msg02394.html
> I'm trying to run Hadoop on multiple CPU's, but without using HDFS.

To be clear: you need some sort of shared filesystem, if not HDFS, then
NFS, S3, or something else.  For example, the job client interacts with
the job tracker by copying files to the shared filesystem named by
fs.default.name, and job inputs and outputs are assumed to come from a
shared filesystem.

So, if you're using NFS, then you'd set fs.default.name to something
like "file:///mnt/shared/hadoop/".  Note also that as your cluster
grows, NFS will soon become a bottleneck.  That's why HDFS is provided:
there aren't other readily available shared filesystems that scale
appropriately.

Doug

Reply via email to