yes, client was a namenode and also a datanode. thanks Raghu, will try not running datanode.
- Prasad. On Thursday 18 September 2008 12:00:30 am Raghu Angadi wrote: > pvvpr wrote: > > The time seemed to be around double the time taken to scp. Didn't realize > > it could be due to replication. > > twice slow is not expected. One possibility is that your client is also > one of the datanodes (i.e. you are reading from and writing to the same > disk). > > Raghu. > > > Regd dfs being faster than scp, the statement came more out of > > expectation (or wish list) rather than anything else. Since scp is the > > most elementary way of copying files, was thinking if the network > > topology of the cluster can be exploited in any way. The only intuition I > > had was there may be some approaches faster than scp, if any concepts > > from P2P file sharing are used here. Though I didn't fully explore P2P, I > > thought there may be some new developments in that area which may be > > useful here? After napster's centralized way of copying, I think there > > were quite a bit of > > improvements? Just thinking loud. > > > > - Prasad. > > > >> How much slower is 'dfs -put' any way? How large is the file you are > >> copying? > >> > >> > but shouldn't that > >> > be atleast as fast as copying data to namenode from a single machine, > >> > >> It would be "at most" as fast as scp assuming you are not cpu bound. Why > >> would you think dfs be faster even if it copying to a single replica? > >> > >> Raghu. > >> > >> Dennis Kubes wrote: > >>> While an scp will copy data to the namenode machine, it does *not* > >>> store the data in dfs, it simply copies the data to namenode machine. > >>> This is the same as copying data to any other machine. The data isn't > >>> in DFS and is not accessible from DFS. If the box running the namenode > >>> fails you lose your data. > >>> > >>> The reason put is slower is that the data is actually being stored into > >>> the DFS on multiple machines in block format. It is then accessible > >>> from programs accessing the DFS such as MR jobs. > >>> > >>> Dennis > >>> > >>> Prasad Pingali wrote: > >>>> Hello, > >>>> I observe that scp of data to the namenode is faster than actually > >>>> putting into dfs (all nodes coming from same switch and have same > >>>> ethernet cards, homogenous nodes)? I understand that "dfs -put" breaks > >>>> the data into blocks and then copies to datanodes, but shouldn't that > >>>> be atleast as fast as copying data to namenode from a single machine, > >>>> if not faster? > >>>> > >>>> thanks and regards, > >>>> Prasad Pingali, > >>>> IIIT Hyderabad.
