If you start your job in a single-node cluster with a "local"
configuration (conf.set("mapred.job.tracker", "local") and
conf.set("fs.default.name", "local")), you can _almost_ debug all the
vital parts. I use this method (though its been deprecated) to debug
my Map and Reduce functions locally.Remote debugging with multiple-nodes would still be cool to have though. On Sun, Aug 15, 2010 at 10:40 AM, Rita Liu <[email protected]> wrote: > Thank you very much, Piyush! I'll do as you say :DD Thanks a lot!! > > Thanks Smith :) hmm ... I see. ok :) > > Please give me more guidance and suggestions if possible, dear experts! > -Rita :)) > > On Sat, Aug 14, 2010 at 10:09 PM, smith jack <[email protected]> wrote: > >> that means you can only trace by log, >> and not possible to debug hadoop using step debug, haha >> distributed system always introduce extra complexity and confusing issues. >> >> 2010/8/15 Piyush Garg <[email protected]>: >> > Hi Rita, >> > >> > You can put log4j logger debug statements in the code. log4j library is >> > part of hadoop framework and there is already a log4j.properties file in >> > hadoop conf directory and all the output logs are saved in hadoop logs >> > directory. >> > >> > Thanks and Regards >> > Piyush Garg >> > >> > >> > On Sunday 15 August 2010 10:20 AM, Rita Liu wrote: >> >> Thank you very much, Piyush! :) May I know more about how to use >> "traces"? >> >> >> >> And -- yes, please teach me if possible, experts! :) >> >> >> >> Thanks a lot, >> >> -Rita :)) >> >> >> >> On Sat, Aug 14, 2010 at 9:42 PM, Piyush Garg <[email protected]> >> wrote: >> >> >> >> >> >>> Hi Rita, >> >>> >> >>> I have just started to learn hadoop as well, I know there is a long way >> >>> to go. >> >>> I found some useful links which I am sharing with you. >> >>> >> >>> Hadoop Tutorial - YDN >> >>> <http://developer.yahoo.com/hadoop/tutorial/index.html> excellent >> >>> beginners tutorial and well organized. >> >>> Running Hadoop On Ubuntu Linux (Single-Node Cluster) - Michael G. Noll >> >>> < >> >>> >> http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Single-Node_Cluster%29 >> >>> >> >>>> >> >>> Running_Hadoop_On_Ubuntu_Linux_(Multi-Node_Cluster) >> >>> < >> >>> >> http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29 >> >>> >> >>>> >> >>> The tutorial on the hadoop wiki >> >>> <http://hadoop.apache.org/common/docs/r0.20.0/mapred_tutorial.html> is >> >>> too much for a beginner. >> >>> >> >>> Debugger: >> >>> I do not think you can easily do debugging using remote debugger. This >> >>> is natural since hadoop is not sequential programming, it would be very >> >>> difficult to debug its apps. >> >>> The only way to debug is to use traces. >> >>> >> >>> I think you can learn how to setup multi-node cluster, but for practice >> >>> session you can use single node setup. >> >>> >> >>> Lets see what the experts say. >> >>> >> >>> Thanks and Regards >> >>> Piyush Garg >> >>> >> >>> >> >>> On Sunday 15 August 2010 09:07 AM, Rita Liu wrote: >> >>> >> >>>> Hi! >> >>>> >> >>>> I am a total beginner, but I am very interested in hadoop. I've >> already >> >>>> downloaded hadoop 0.19.2 and run on Ubuntu in single-node mode. Now I >> >>>> >> >>> want >> >>> >> >>>> to do two things: >> >>>> >> >>>> 1. Explore how hadoop works internally with one of the example >> >>>> >> >>> applications >> >>> >> >>>> hadoop provides >> >>>> 2. Write an application on my own >> >>>> >> >>>> Those two things bring me following questions: >> >>>> >> >>>> a. debugger? >> >>>> I am stuck since I don't know how to "explore" hadoop. I used to trace >> >>>> through the code using a debugger, but in this case, I don't know if >> >>>> >> >>> there >> >>> >> >>>> is a good debugger to use; or -- maybe a debugger is not necessary for >> >>>> hadoop? If not, then how do you trace through the code to either debug >> or >> >>>> just gain an understanding about the system? May I know what you, >> >>>> experienced experts, do? :) >> >>>> >> >>>> b. Where to run hadoop? >> >>>> Also -- may I know where you run your hadoop? Do you run on linux, or >> on >> >>>> >> >>> VM >> >>> >> >>>> -- in particular, Cloudera? I heard that Cloudera is good for writing >> >>>> mapreduce applications with hadoop itself as a blackbox; is it true? >> If >> >>>> >> >>> my >> >>> >> >>>> ultimate goal is to understand how hadoop works internally, would it >> be >> >>>> better if I directly run it on linux? >> >>>> >> >>>> c. Single-node or multi-node? >> >>>> In the beginning (just like my case :p) would it be better to use >> >>>> single-node or multi-node? If the latter is true, should I obtain more >> >>>> machines, or should I use more virtual machines to create more nodes? >> >>>> >> >>>> As a newbie, I am sorry for all those basic (and silly, I know :$) >> >>>> questions. If possible, please help me out? Any suggestion or advice >> will >> >>>> >> >>> be >> >>> >> >>>> greatly appreciated. Thank you very much! >> >>>> >> >>>> Best, >> >>>> Rita :) >> >>>> >> >>>> P.S. If my questions are not suitable for this mailing-list, please >> let >> >>>> >> >>> me >> >>> >> >>>> apologize, and then, could you please direct me to other >> mailing-lists? >> >>>> Sorry, and thanks a lot! :) >> >>>> >> >>>> >> >>>> >> >>> >> >> >> > >> > -- Harsh J www.harshj.com
