Hi Varad, What is your splitsize? and how many nodes cluster are you running?
I think the issue is with generating the right number of splits which decides the number of maps you will run. Processing all the data or more data on few mappers will not give you the Map Reduce advantage of parallelism. Regards, Ravi Teja -----Original Message----- From: Varad Meru [mailto:varad_m...@persistent.co.in] Sent: Monday, August 29, 2011 3:30 PM To: mapreduce-user@hadoop.apache.org Cc: varad.m...@gmail.com Subject: Very slow MapReduce Job Hi, I wrote a custom InputFormat for parsing through the Enron Email corpus which is attached in the file named EmailInputFormat I have attached the code in a text file with the sample input mail also attached as a text document The EmailClass extends Writable and implements all the methods needed to be implemented and also contains an initiate function to initialize the values in that class. This initiate method looks is written in the EmailClass.java The above method is called by nextKeyValue method which is written in the EmailRecordReader.txt ------------------------------------ Question: 1. Is it a feasible to build large custom objects within nextKeyValue() to run in Hadoop? 2. MR program which does a simple task of emitting message-id and from field email-id from enron corpus of 6 lakh emails merged into one file (174 MB) takes around 50 minutes on a pseudo node cluster. This is very very slow. Please help me in this aspect too. 3. Can static field of value in EMailRecordReader help in this situation? Thanks in advance, Varad. ------------------------------------ Varad Meru| Software Engineer varad_m...@persistent.co.in Persistent Systems and Solution Ltd. | Partners in Innovation | www.persistentsys.com DISCLAIMER ========== This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.