[Lucene-hadoop Wiki] Update of "RandomWriter" by OwenOMalley

Apache Wiki Wed, 28 Jun 2006 14:38:50 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for 
change notification.


The following page has been changed by OwenOMalley:
http://wiki.apache.org/lucene-hadoop/RandomWriter

------------------------------------------------------------------------------
  '''RandomWriter''' example
  
- ''RandomWriter" example implements a distributed random writer. Each map 
takes a file name as input and writes one Gig of random data to the DFS 
sequence file by generating multiple records, each with a random key and a 
random value. A mapper does not emit any output and the reduce phase is null.
+ ''RandomWriter'' example writes random data to DFS using Map/Reduce. Each map 
takes a file name as input and writes one Gig of random data to the DFS 
sequence file by generating multiple records, each with a random key and a 
random value. A mapper does not emit any output and the reduce phase is not 
used.
+ 
+ The specifics of the generated data are configurable. The configuration 
variables are:
+ 
+ || Name || Default Value || Description ||
+ || test.randomwriter.maps_per_host || 10 || Number of maps/host ||
+ || test.randomwrite.bytes_per_map || 1024*1024*1024 || Number of bytes 
written/map ||
+ || test.randomwrite.min_key || 10 || minimum size of the key in bytes ||
+ || test.randomwrite.max_key || 1000 || maximum size of the key in bytes ||
+ || test.randomwrite.min_value || 0 || minimum size of the value ||
+ || test.randomwrite.max_value || 20000 || maximum size of the value ||
+ 
  
  This example uses a useful pattern for dealing with Hadoop's constraints on 
!InputSplits. Since each input split can only consist of a file and byte range 
and we want to control how many maps there are (and we don't really have any 
inputs), we create a directory with a set of artificial files, each of which 
contains the filename that we want a given map to write to. Then, using the 
text line reader and this "fake" input directory, we generate exactly the right 
number of maps. Each map gets a single record that is the filename, to which it 
is supposed to write its output.

[Lucene-hadoop Wiki] Update of "RandomWriter" by OwenOMalley

Reply via email to