Hi! :)

I am a total beginner in both Java and Hadoop Mapreduce, now my
current goal is to understand how hadoop works internally when it
processes an mapreduce application. Now I know two things:

1. I'd better use log4j to trace the code.
2. Debugging in a distributed system in general is not a good idea.

I plan to use log4j -- so I plan to put some loggers in the methods
which will be involved while hadoop is processing the mapreduce
application, but I get stuck, since I don't know which methods will be
involved. It's a chicken-egg problem -- I want to know which methods
will be involved (and what will happen), and thus I plan to use log4j;
in order to use log4j, I'll have to know which methods will be
involved so that I know where to put my loggers.

Is there any way that I could trace the call stack? Or: I may not know
how to cleverly use log4j -- maybe I could just put a logger inside
the main method of a mapreduce application (say, WordCount.java), and
it will magically record all the trace information for me?

Also -- Although I know debugging in such a distributed system as
hadoop is not recommended, I still wonder if I could load hadoop into
some debugger, say, Eclipse, and trace the code locally. I am confused
about how to load the whole project into hadoop because there are lots
of library jar files, src files, and dependencies among them. Has
anyone locally traced hadoop in Eclipse? May I know how to handle
those dependencies and load the project?

... I know those are very basic (and silly) questions, sorry :$ and
thank you very much! If possible, please help me out so that I can at
least start? Any suggestion and advice will be greatly appreciated.
Thanks again!

Best,
Rita :)

Reply via email to