On linux and jvm6 with normal IDE disks and a giga ethernet switch with corresponding NIC and with hadoop 0.9.11's HDFS. We wrote a C program by using the native libs provided in the package but then we tested again with distcp. The scenario was as follows: We ran the test on a cluster with 1 node, then we added the nodes one by one until reaching 5 nodes. Same test with samba saturated the link with only one node.
--jaf On 4/16/07, Doug Cutting <[EMAIL PROTECTED]> wrote:
Please use a new subject when starting a new topic. jafarim wrote: > Sorry if being off topic, but we experienced a very low bandwidth with > hadoop while copying files to/from the cluster (some 1/100 comparing to > plain samba share). The bandwidth did not improve at all by adding nodes to > the cluster. At that time I thought that hadoop is not supposed to be used > for this purpose and did not use it for my project. > I am just curious how much scalable hadoop is and how bandwidth should grow > as nodes are added to the cluster. It's not clear to me what you tried. Are you running HDFS? On how large of a cluster? What version of Hadoop? What operating system? How were you copying files to/from the cluster? The 'bin/hadoop distcp' command should scale to consume available network bandwidth and disk i/o. Doug
