Hi,
The replication config values state the number of copies of each block that 
will eventually exist. So, 1 means each block will exist on only 1 node (no 
redundancy); 3 (the default) means that each block will exist on at least 3 
nodes.

BTW: the job file is copied to HDFS and HDFS takes care of the replication.


Friso



On 19 mei 2011, at 18:44, Steve Cohen wrote:

> One last question about these replication values. If dfs.replication
> and mapred.submit.replication are set to 1, does that mean they get
> copied one time so there are two dfs blocks and two job files or does
> it mean there is one dfs block and one job file?
> 
> Thanks,
> Steve Cohen
> 
> On Thu, May 19, 2011 at 2:43 AM, Friso van Vollenhoven
> <fvanvollenho...@xebia.com> wrote:
>> I believe it's this:
>> 
>> <property>
>>  <name>mapred.submit.replication</name>
>>  <value>10</value>
>>  <description>The replication level for submitted job files.  This
>>  should be around the square root of the number of nodes.
>>  </description>
>> </property>
>> 
>> You can set it per job in the job specific conf and/or in mapred-site.xml.
>> 
>> 
>> Friso
>> 
>> 
>> 
>> On 19 mei 2011, at 03:42, Steve Cohen wrote:
>> 
>>> Where is the default replication factor on job files set? Is it different 
>>> then the dfs.replication setting in hdfs-site.xml?
>>> 
>>> Sent from my iPad
>>> 
>>> On May 18, 2011, at 9:10 PM, Joey Echeverria <j...@cloudera.com> wrote:
>>> 
>>>> Did you run a map reduce job?
>>>> 
>>>> I think the default replication factor on job files is 10, which
>>>> obviously doesn't work well on a psuedo-distributed cluster.
>>>> 
>>>> -Joey
>>>> 
>>>> On Wed, May 18, 2011 at 5:07 PM, Steve Cohen <mail4st...@gmail.com> wrote:
>>>>> Thanks for the answer. Earlier, I asked about why I get occasional not 
>>>>> replicated yet errors. Now, I had dfs.replication set to one. What 
>>>>> replication could it have been doing? Did the error messages actually 
>>>>> mean that the file couldn't get created in the cluster?
>>>>> 
>>>>> Thanks,
>>>>> Steve Cohen
>>>>> 
>>>>> 
>>>>> 
>>>>> On May 18, 2011, at 6:39 PM, Todd Lipcon <t...@cloudera.com> wrote:
>>>>> 
>>>>>> Tried to send this, but apparently SpamAssassin finds emails about
>>>>>> "replicas" to be spammy. This time with less rich text :)
>>>>>> 
>>>>>> On Wed, May 18, 2011 at 3:35 PM, Todd Lipcon <t...@cloudera.com> wrote:
>>>>>>> 
>>>>>>> Hi Steve,
>>>>>>> Running setrep will indeed change those files. Changing 
>>>>>>> "dfs.replication" just changes the default replication value for files 
>>>>>>> created in the future. Replication level is a file-specific property.
>>>>>>> Thanks
>>>>>>> -Todd
>>>>>>> 
>>>>>>> On Wed, May 18, 2011 at 3:32 PM, Steve Cohen <mail4st...@gmail.com> 
>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> Say I add a datanode to a pseudo cluster and I want to change the
>>>>>>>> replication factor to 2. I see that I can either run hadoop fs -setrep
>>>>>>>> or change the hdfs-site.xml value for dfs.replication. But do either
>>>>>>>> of these cause the existing blocks to replicate?
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Steve Cohen
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Todd Lipcon
>>>>>>> Software Engineer, Cloudera
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Todd Lipcon
>>>>>> Software Engineer, Cloudera
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Joseph Echeverria
>>>> Cloudera, Inc.
>>>> 443.305.9434
>> 
>> 

Reply via email to