Dear friends, I am new at Hadoop and I must say I just want to use it as a map & reduce framework.
I've developed an application to be run in a server with 8 CPU and everything seems to work properly but the performance. It doesn't use all the CPU power. I'm trying to process 200.000 documents and get some annotations of each document (first and last names in the mapper) and merge it in the reduce task (if I find a first name and a last name together => a name). I've developed my own record reader because I want to get the URI of each document I process. So for that record reader I have the URI as key and the content as value. Here is the most import method (in my opinion): I also must say that I'm not running the applications by using the bin/hadoop script but using java command directly because I wasn't able to do it. So could you help me to use all the power of my CPU? Thanks in advance. Pedro
