Re: [Map/Reduce][HDFS]

Ted Dunning Fri, 28 Mar 2008 08:54:01 -0700


Try running dfs -put on each of the machines that has content.  That will
give you good balance and should let you write at very high speed (depending
on your cluster size).



On 3/28/08 8:27 AM, "Jean-Pierre" <[EMAIL PROTECTED]>
wrote:

> Hello
> 
> I'm not sure I've understood...actually I've already set this field in
> the configuration file. I think this field is just to specify the master
> for the HDFS. 
> 
> My problem is that I have many machines with, on each one, a bunch of
> files which represent the distributed data ... and I want to use this
> distribution of data with hadoop. Maybe there is another configuration
> file which allow me to say to hadoop how to use my file distribution.
> Is it possible ? Should I look to adapt my distribution of data to the
> hadoop one ?
> 
> Anyway thanks for your answer Peeyush.
> 
> On Fri, 2008-03-28 at 16:22 +0530, Peeyush Bishnoi wrote:
>> hello ,
>> 
>> Yes you can do this by specify in hadoop-site.xml about the location of
>> namenode , where your data is already get distributed.
>> 
>> ---------------------------------------------------------------
>> <property>
>>   <name>fs.default.name</name>
>>   <value> <IPAddress:PortNo>  </value>
>> </property>
>> 
>> ---------------------------------------------------------------
>> 
>> Thanks
>> 
>> ---
>> Peeyush
>> 
>> 
>> On Thu, 2008-03-27 at 15:41 -0400, Jean-Pierre wrote:
>> 
>>> Hello,
>>> 
>>> I'm working on large amount of logs, and I've noticed that the
>>> distribution of data on the network (./hadoop dfs -put input input)
>>> takes a lot of time.
>>> 
>>> Let's says that my data is already distributed among the network, is
>>> there anyway to say to hadoop to use the already existing
>>> distribution ?.
>>> 
>>> Thanks
>>> 
> 
>

Re: [Map/Reduce][HDFS]

Reply via email to