map reduce optimization

rahul rai Sun, 03 May 2015 12:02:34 -0700

Hi,
Can somebody help preparation of map reduce settings.
We recently set up Hadoop 10 nodes Hadoop cluster and have 10 TB of data mostly 
xml data small files in zipped. This is our initial POC.


I would like to know a few things , the steps to be followed

1. As per my understanding these 10 TB of data we upload to NameNode. 
2. Since the files are very small average 100 kb xml files . How should we 
combine these xml files . Can we combine roughly 1 TB each so that it would 
come to 10 files.3. We set the block size to 128MB.4. Do we need to put these 
10 files each 1TB in one folder or 10 files in just 1 folder.5. In the case of 
4 above  if it is only 1 folder having 10 files of 1 TB we run the map reduce . 
Is it better to run 10 map reduce jobs for each folder in case of 10 folders or 
just one map reduce for 1 folder.
6. In case we run 1 map reduce job using 1 folder having 10 files each 1 TB , 
the map tasks I calculate is 10*1*1024*1024/128=81920 mappers. Can the system 
sustain these many mappers
Thanksrai

map reduce optimization

Reply via email to