Okay, thanks. I -hoped- it was this way. Sadly, all my files are small (the largest are around 40MB). But oh well!
-j On May 10, 2011, at 10:46 AM, Matthew Foley wrote: > Will's right, meta-data transactions go through the Namenode, but all the > content data > read/write activity is directly between Clients and Datanodes, and > replication activity is > Datanode-to-Datanode. No bottlenecks, as long as your Namenode has enough > ram to > hold the namespace in memory, and enough cores to handle a modestly high > transaction > rate. > > And if the individual data files are large (Hadoop-scale "large", that is :-) > ), you can even > decrease the meta-data/data ratio by increasing the block size from the > default 32MB > to 64MB or even 128MB. > > --Matt > > > On May 10, 2011, at 6:03 AM, Will Maier wrote: > > Hi Jonathan- > > On Tue, May 10, 2011 at 05:50:03AM -0700, Jonathan Disher wrote: >> I will preface this with a couple statements: a) it's almost 6am, and I've >> been up all night b) I'm drugged up from an allergic reaction, so I may not >> be >> firing on all 64 bits. >> >> Do I correctly understand the HDFS architecture in that the namenode is a >> network bottleneck into the system? I.e., it doesn't really matter how many >> ethernet interfaces I roll into my data nodes, I will always be limited in >> how much traffic I can drive to the HDFS pool by the network capacity of the >> namenode? > > No. This diagram should help: > > > http://hadoop.apache.org/hdfs/docs/current/hdfs_design.html#NameNode+and+DataNodes > > The Namenode is a single point of failure, not (under most imaginable > conditions) a bottleneck. > >> I am trying to move a -lot- of data, and i'd like to not throttle the >> namenode >> (especially in the old cluster, where I cannot just bond up more interfaces). >> If there's a way to spread the inbound network (for block writes) traffic I'd >> love to hear it. > > During our (highly distributed) migration, we were writing into HDFS at up to > 5 GB/s. > The more datanodes and writers you have, the faster your aggregate throughput. > > -- > > Will Maier - UW High Energy Physics > cel: 608.438.6162 > tel: 608.263.9692 > web: http://www.hep.wisc.edu/~wcmaier/