Splitting within regions

Jonathan M. Kupferman Thu, 08 May 2008 13:32:45 -0700

Hi Everyone,
I am currently attempting to run a Map Reduce job where the input
comed from HBase. The input table has 22 regions, and thus creates 22
map tasks. This however creates an issue since so few map tasks
results in a poor distribution of labor on a cluster of 10+ machines,
specifically since the amount of work required is highly variable
depending on the region.


I would like to increase the number of map tasks at least 2 fold,
the relevant code seems to be in TableInputFormat.

//Original code
    Text[] startKeys = m_table.getStartKeys();
     if(startKeys == null || startKeys.length == 0) {
       throw new IOException("Expecting at least one region");
     }
     InputSplit[] splits = new InputSplit[startKeys.length];
     for(int i = 0; i < startKeys.length; i++) {
       splits[i] = new TableSplit(m_tableName, startKeys[i],
           ((i + 1) < startKeys.length) ? startKeys[i + 1] : new Text());
     }
//end-original

//Modified code
    Text[] startKeys = m_table.getStartKeys();
     if(startKeys == null || startKeys.length == 0) {
       throw new IOException("Expecting at least one region");
     }
     InputSplit[] splits = new InputSplit[startKeys.length*2];
     for(int i = 0; i < startKeys.length; i++) {

Text halfsplit = new Text(""+Integer.parseInt(startKeys[i +1].toString())/2);

       splits[i] = new TableSplit(m_tableName, startKeys[i], halfsplit);

splits[i+1] = new TableSplit(m_tableName, halfsplit ,((i + 1)< startKeys.length) ? startKeys[i + 2] : new Text());

     }
//end-modified

Is seems like the required modifications would be something along thelines the code written above. Is this the correct/best way to go aboutthis?



Thanks,
Jonathan

Splitting within regions

Reply via email to