MapRed Split Size

Mohamed Riadh Trad Thu, 01 Jul 2010 07:33:28 -0700

Hi,

Has any one addressed the org.apache.hadoop.mapreduce.lib.input.TextInputFormat 
compatibility with hadoop streaming?


The new API generates the following exception when lunching pipes jobs with  
org.apache.hadoop.mapreduce.lib.input.TextInputFormat  Input Format instead of 
org.apache.hadoop.mapred.TextInputFormat.


Exception in thread "main" java.lang.RuntimeException: 
java.lang.RuntimeException: class 
org.apache.hadoop.mapreduce.lib.input.TextInputFormat not 
org.apache.hadoop.mapred.InputFormat

My problem with the deprecated classes stands in mapred.min.split.size and the 
Map Tasks number.

I need to generate N Maps on splits of approximately a same size. However, by 
fixing the  mapred.min.split.size to 20MB I get splits of 6 to 64 MB.

Any suggestions?

Trad Mohamed Riadh, M.Sc, Ing.
PhD. student
INRIA-TELECOM PARISTECH 

Office: 11-15
Phone: (33)-1 39 63 59 33
Fax: (33)-1 39 63 56 74
Email: Riadh.Trad(a)inria.fr
Home page: http://www-rocq.inria.fr/who/Mohamed.Trad/

MapRed Split Size

Reply via email to