Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for 
change notification.

The following page has been changed by AlbertStrasheim:
http://wiki.apache.org/lucene-hadoop/HowManyMapsAndReduces

The comment on the change is:
fixed typo

------------------------------------------------------------------------------
  The number of maps is usually driven by the number of DFS blocks in the input 
files. Although that causes people to adjust their DFS block size to adjust the 
number of maps. The right level of parallelism for maps seems to be around 
10-100 maps/node, although we have taken it up to 300 or so for very cpu-light 
map tasks. 
  Task setup takes awhile, so it is best if the maps take at least a minute to 
execute.
  
- Actually controlling the number of maps is subtle. The mapred.map.tasks 
parameter is just a hint to the !InputFormat for the nubmer of maps. The 
default !InputFormat behavior is to split the total number of bytes into the 
right number of fragments. However, the DFS block size of the input files is 
treated as an upper bound for input splits. A lower bound on the split size can 
be set via mapred.min.split.size. Thus, if you expect 10TB of input data and 
have 128MB DFS blocks, you'll end up with 82k maps, unless your 
mapred.map.tasks is even larger.
+ Actually controlling the number of maps is subtle. The mapred.map.tasks 
parameter is just a hint to the !InputFormat for the number of maps. The 
default !InputFormat behavior is to split the total number of bytes into the 
right number of fragments. However, the DFS block size of the input files is 
treated as an upper bound for input splits. A lower bound on the split size can 
be set via mapred.min.split.size. Thus, if you expect 10TB of input data and 
have 128MB DFS blocks, you'll end up with 82k maps, unless your 
mapred.map.tasks is even larger.
  
  == Number of Reduces ==
  

Reply via email to