Hi! :) I am a total beginner in both Java and Hadoop Mapreduce, now my current goal is to understand how hadoop works internally when it processes an mapreduce application. Now I know two things:
1. I'd better use log4j to trace the code. 2. Debugging in a distributed system in general is not a good idea. I plan to use log4j -- so I plan to put some loggers in the methods which will be involved while hadoop is processing the mapreduce application, but I get stuck, since I don't know which methods will be involved. It's a chicken-egg problem -- I want to know which methods will be involved (and what will happen), and thus I plan to use log4j; in order to use log4j, I'll have to know which methods will be involved so that I know where to put my loggers. Is there any way that I could trace the call stack? Or: I may not know how to cleverly use log4j -- maybe I could just put a logger inside the main method of a mapreduce application (say, WordCount.java), and it will magically record all the trace information for me? Also -- Although I know debugging in such a distributed system as hadoop is not recommended, I still wonder if I could load hadoop into some debugger, say, Eclipse, and trace the code locally. I am confused about how to load the whole project into hadoop because there are lots of library jar files, src files, and dependencies among them. Has anyone locally traced hadoop in Eclipse? May I know how to handle those dependencies and load the project? ... I know those are very basic (and silly) questions, sorry :$ and thank you very much! If possible, please help me out so that I can at least start? Any suggestion and advice will be greatly appreciated. Thanks again! Best, Rita :)
