Try running dfs -put on each of the machines that has content. That will give you good balance and should let you write at very high speed (depending on your cluster size).
On 3/28/08 8:27 AM, "Jean-Pierre" <[EMAIL PROTECTED]> wrote: > Hello > > I'm not sure I've understood...actually I've already set this field in > the configuration file. I think this field is just to specify the master > for the HDFS. > > My problem is that I have many machines with, on each one, a bunch of > files which represent the distributed data ... and I want to use this > distribution of data with hadoop. Maybe there is another configuration > file which allow me to say to hadoop how to use my file distribution. > Is it possible ? Should I look to adapt my distribution of data to the > hadoop one ? > > Anyway thanks for your answer Peeyush. > > On Fri, 2008-03-28 at 16:22 +0530, Peeyush Bishnoi wrote: >> hello , >> >> Yes you can do this by specify in hadoop-site.xml about the location of >> namenode , where your data is already get distributed. >> >> --------------------------------------------------------------- >> <property> >> <name>fs.default.name</name> >> <value> <IPAddress:PortNo> </value> >> </property> >> >> --------------------------------------------------------------- >> >> Thanks >> >> --- >> Peeyush >> >> >> On Thu, 2008-03-27 at 15:41 -0400, Jean-Pierre wrote: >> >>> Hello, >>> >>> I'm working on large amount of logs, and I've noticed that the >>> distribution of data on the network (./hadoop dfs -put input input) >>> takes a lot of time. >>> >>> Let's says that my data is already distributed among the network, is >>> there anyway to say to hadoop to use the already existing >>> distribution ?. >>> >>> Thanks >>> > >
