Hi Harsh and Piyush! Thank you very much. So it seems like it would be best if I use log4j to trace, and debugging with a debugger is still possible if I set "mapred.job.tracker" to be "local" and "fs.default.name" to be "local", in hadoop-site.xml. Plus: in hadoop-env.sh, I should specify HADOOP_OPTS to be:
"-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=8000" (why 8000? also, what does "-agentlib:jdwp=transport=dt_socket" mean?) ... in order to use a debugger. Is my understanding correct? :) If so -- then which debugger do you use? May I know? Thanks a lot! I am also going to try log4j now! Many thanks, -Rita :)) On Sat, Aug 14, 2010 at 10:22 PM, Piyush Garg <[email protected]>wrote: > Hi Smith, > > step debugging also works in hadoop as with other java applications. > export > > HADOOP_OPTS="-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=8000" > 'suspend=y' is to let the jvm suspend until the remote debugger is > attached. > > Thanks and Regards > Piyush Garg > > > On Sunday 15 August 2010 10:39 AM, smith jack wrote: > > that means you can only trace by log, > > and not possible to debug hadoop using step debug, haha > > distributed system always introduce extra complexity and confusing > issues. > > > > 2010/8/15 Piyush Garg <[email protected]>: > > > >> Hi Rita, > >> > >> You can put log4j logger debug statements in the code. log4j library is > >> part of hadoop framework and there is already a log4j.properties file in > >> hadoop conf directory and all the output logs are saved in hadoop logs > >> directory. > >> > >> Thanks and Regards > >> Piyush Garg > >> > >> > >> On Sunday 15 August 2010 10:20 AM, Rita Liu wrote: > >> > >>> Thank you very much, Piyush! :) May I know more about how to use > "traces"? > >>> > >>> And -- yes, please teach me if possible, experts! :) > >>> > >>> Thanks a lot, > >>> -Rita :)) > >>> > >>> On Sat, Aug 14, 2010 at 9:42 PM, Piyush Garg <[email protected]> > wrote: > >>> > >>> > >>> > >>>> Hi Rita, > >>>> > >>>> I have just started to learn hadoop as well, I know there is a long > way > >>>> to go. > >>>> I found some useful links which I am sharing with you. > >>>> > >>>> Hadoop Tutorial - YDN > >>>> <http://developer.yahoo.com/hadoop/tutorial/index.html> excellent > >>>> beginners tutorial and well organized. > >>>> Running Hadoop On Ubuntu Linux (Single-Node Cluster) - Michael G. Noll > >>>> < > >>>> > http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Single-Node_Cluster%29 > >>>> > >>>> > >>>>> > >>>> Running_Hadoop_On_Ubuntu_Linux_(Multi-Node_Cluster) > >>>> < > >>>> > http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29 > >>>> > >>>> > >>>>> > >>>> The tutorial on the hadoop wiki > >>>> <http://hadoop.apache.org/common/docs/r0.20.0/mapred_tutorial.html> > is > >>>> too much for a beginner. > >>>> > >>>> Debugger: > >>>> I do not think you can easily do debugging using remote debugger. This > >>>> is natural since hadoop is not sequential programming, it would be > very > >>>> difficult to debug its apps. > >>>> The only way to debug is to use traces. > >>>> > >>>> I think you can learn how to setup multi-node cluster, but for > practice > >>>> session you can use single node setup. > >>>> > >>>> Lets see what the experts say. > >>>> > >>>> Thanks and Regards > >>>> Piyush Garg > >>>> > >>>> > >>>> On Sunday 15 August 2010 09:07 AM, Rita Liu wrote: > >>>> > >>>> > >>>>> Hi! > >>>>> > >>>>> I am a total beginner, but I am very interested in hadoop. I've > already > >>>>> downloaded hadoop 0.19.2 and run on Ubuntu in single-node mode. Now I > >>>>> > >>>>> > >>>> want > >>>> > >>>> > >>>>> to do two things: > >>>>> > >>>>> 1. Explore how hadoop works internally with one of the example > >>>>> > >>>>> > >>>> applications > >>>> > >>>> > >>>>> hadoop provides > >>>>> 2. Write an application on my own > >>>>> > >>>>> Those two things bring me following questions: > >>>>> > >>>>> a. debugger? > >>>>> I am stuck since I don't know how to "explore" hadoop. I used to > trace > >>>>> through the code using a debugger, but in this case, I don't know if > >>>>> > >>>>> > >>>> there > >>>> > >>>> > >>>>> is a good debugger to use; or -- maybe a debugger is not necessary > for > >>>>> hadoop? If not, then how do you trace through the code to either > debug or > >>>>> just gain an understanding about the system? May I know what you, > >>>>> experienced experts, do? :) > >>>>> > >>>>> b. Where to run hadoop? > >>>>> Also -- may I know where you run your hadoop? Do you run on linux, or > on > >>>>> > >>>>> > >>>> VM > >>>> > >>>> > >>>>> -- in particular, Cloudera? I heard that Cloudera is good for writing > >>>>> mapreduce applications with hadoop itself as a blackbox; is it true? > If > >>>>> > >>>>> > >>>> my > >>>> > >>>> > >>>>> ultimate goal is to understand how hadoop works internally, would it > be > >>>>> better if I directly run it on linux? > >>>>> > >>>>> c. Single-node or multi-node? > >>>>> In the beginning (just like my case :p) would it be better to use > >>>>> single-node or multi-node? If the latter is true, should I obtain > more > >>>>> machines, or should I use more virtual machines to create more nodes? > >>>>> > >>>>> As a newbie, I am sorry for all those basic (and silly, I know :$) > >>>>> questions. If possible, please help me out? Any suggestion or advice > will > >>>>> > >>>>> > >>>> be > >>>> > >>>> > >>>>> greatly appreciated. Thank you very much! > >>>>> > >>>>> Best, > >>>>> Rita :) > >>>>> > >>>>> P.S. If my questions are not suitable for this mailing-list, please > let > >>>>> > >>>>> > >>>> me > >>>> > >>>> > >>>>> apologize, and then, could you please direct me to other > mailing-lists? > >>>>> Sorry, and thanks a lot! :) > >>>>> > >>>>> > >>>>> > >>>>> > >>>> > >>> > >> >
