Hi, The awk script we wrote for our function took only 40 seconds to process 174 MB of data where as the MR code is taking more than 1 hr when run on eclipse. We ran a simple text processing MR to check the answer and it ran in 4 min in eclipse. I am running on eclipse to verify the functionality of our program. My split-size is default split size ( I did a split.getLength() and it showed 332K bytes ~ 32 MB. Don't know why it wasn't 64 MB )
I have written a basic parser using if-else statements inside the EMailClass which implements Writable. -----Original Message----- Hi Varad, What is your splitsize? and how many nodes cluster are you running? I think the issue is with generating the right number of splits which decides the number of maps you will run. Processing all the data or more data on few mappers will not give you the Map Reduce advantage of parallelism. Regards, Ravi Teja -----Original Message----- From: Varad Meru [mailto:varad_m...@persistent.co.in] Sent: Monday, August 29, 2011 3:30 PM To: mapreduce-user@hadoop.apache.org Cc: varad.m...@gmail.com Subject: Very slow MapReduce Job Hi, I wrote a custom InputFormat for parsing through the Enron Email corpus which is attached in the file named EmailInputFormat I have attached the code in a text file with the sample input mail also attached as a text document The EmailClass extends Writable and implements all the methods needed to be implemented and also contains an initiate function to initialize the values in that class. This initiate method looks is written in the EmailClass.java The above method is called by nextKeyValue method which is written in the EmailRecordReader.txt ------------------------------------ Question: 1. Is it a feasible to build large custom objects within nextKeyValue() to run in Hadoop? 2. MR program which does a simple task of emitting message-id and from field email-id from enron corpus of 6 lakh emails merged into one file (174 MB) takes around 50 minutes on a pseudo node cluster. This is very very slow. Please help me in this aspect too. 3. Can static field of value in EMailRecordReader help in this situation? Thanks in advance, Varad. ------------------------------------ Varad Meru| Software Engineer varad_m...@persistent.co.in Persistent Systems and Solution Ltd. | Partners in Innovation | www.persistentsys.com DISCLAIMER ========== This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails. DISCLAIMER ========== This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.