On Thursday 18 September 2008 04:12:13 pm Steve Loughran wrote: > [EMAIL PROTECTED] wrote: > > thanks for the replies. So looks like replication might be the real > > overhead when compared to scp. > > Makes sense, but there's no reason why you couldn't have first node you > copy up the data to, continue and pass that data to the other nodes. If > its in the same rack, you save on backbone bandwidth, and if it is in a > different rack, well, the client operation still finishes faster. A > feature for someone to implement, perhaps?
Yeah even I was thinking what would be the implications of such a feature in terms of any failures/block corruption at the first node. If that is a non-issue this seems to be something that can improve performance. - Prasad. > > >> Also dfs put copies multiple replicas unlike scp. > >> > >> Lohit > >> > >> On Sep 17, 2008, at 6:03 AM, "��明" <[EMAIL PROTECTED]> wrote: > >> > >> Actually, No. > >> As you said, I understand that "dfs -put" breaks the data into blocksand > >> then copies to datanodes, > >> but scp do not breaks the data into blocksand , and just copy the data > >> to the namenode! > >> > >> > >> 2008/9/17, Prasad Pingali <[EMAIL PROTECTED]>: > >> > >> Hello, > >> I observe that scp of data to the namenode is faster than actually > >> putting > >> into dfs (all nodes coming from same switch and have same ethernet > >> cards, homogenous nodes)? I understand that "dfs -put" breaks the data > >> into blocks > >> and then copies to datanodes, but shouldn't that be atleast as fast as > >> copying data to namenode from a single machine, if not faster? > >> > >> thanks and regards, > >> Prasad Pingali, > >> IIIT Hyderabad. > >> > >> > >> > >> > >> > >> -- > >> Sorry for my english!! 明 > >> Please help me to correct my english expression and error in syntax
