On linux and jvm6 with normal IDE disks and a giga ethernet switch with
corresponding NIC and with hadoop 0.9.11's HDFS. We wrote a C program by
using the native libs provided in the package but then we tested again with
distcp. The scenario was as follows:
We ran the test on a cluster with 1 node, then we added the nodes one by one
until reaching 5 nodes. Same test with samba saturated the link with only
one node.

--jaf


On 4/16/07, Doug Cutting <[EMAIL PROTECTED]> wrote:

Please use a new subject when starting a new topic.

jafarim wrote:
> Sorry if being off topic, but we experienced a very low bandwidth with
> hadoop while copying files to/from the cluster (some 1/100 comparing to
> plain samba share). The bandwidth did not improve at all by adding nodes
to
> the cluster. At that time I thought that hadoop is not supposed to be
used
> for this purpose and did not use it for my project.
> I am just curious how much scalable hadoop is and how bandwidth should
grow
> as nodes are added to the cluster.

It's not clear to me what you tried.  Are you running HDFS?  On how
large of a cluster?  What version of Hadoop?  What operating system?
How were you copying files to/from the cluster?

The 'bin/hadoop distcp' command should scale to consume available
network bandwidth and disk i/o.

Doug

Reply via email to