Re: Hadoop basics

Harsh J Sat, 14 Aug 2010 22:15:21 -0700

If you start your job in a single-node cluster with a "local"
configuration (conf.set("mapred.job.tracker", "local") and
conf.set("fs.default.name", "local")), you can _almost_ debug all the
vital parts. I use this method (though its been deprecated) to debug
my Map and Reduce functions locally.


Remote debugging with multiple-nodes would still be cool to have though.

On Sun, Aug 15, 2010 at 10:40 AM, Rita Liu <[email protected]> wrote:
> Thank you very much, Piyush! I'll do as you say :DD Thanks a lot!!
>
> Thanks Smith :) hmm ... I see. ok :)
>
> Please give me more guidance and suggestions if possible, dear experts!
> -Rita :))
>
> On Sat, Aug 14, 2010 at 10:09 PM, smith jack <[email protected]> wrote:
>
>> that means you can only trace by log,
>> and not possible to debug hadoop using step debug, haha
>> distributed system always introduce extra complexity and confusing issues.
>>
>> 2010/8/15 Piyush Garg <[email protected]>:
>> > Hi Rita,
>> >
>> > You can put log4j logger debug statements in the code. log4j library is
>> > part of hadoop framework and there is already a log4j.properties file in
>> > hadoop conf directory and all the output logs are saved in hadoop logs
>> > directory.
>> >
>> > Thanks and Regards
>> > Piyush Garg
>> >
>> >
>> > On Sunday 15 August 2010 10:20 AM, Rita Liu wrote:
>> >> Thank you very much, Piyush! :) May I know more about how to use
>> "traces"?
>> >>
>> >> And -- yes, please teach me if possible, experts! :)
>> >>
>> >> Thanks a lot,
>> >> -Rita :))
>> >>
>> >> On Sat, Aug 14, 2010 at 9:42 PM, Piyush Garg <[email protected]>
>> wrote:
>> >>
>> >>
>> >>> Hi Rita,
>> >>>
>> >>> I have just started to learn hadoop as well, I know there is a long way
>> >>> to go.
>> >>> I found some useful links which I am sharing with you.
>> >>>
>> >>> Hadoop Tutorial - YDN
>> >>> <http://developer.yahoo.com/hadoop/tutorial/index.html> excellent
>> >>> beginners tutorial and well organized.
>> >>> Running Hadoop On Ubuntu Linux (Single-Node Cluster) - Michael G. Noll
>> >>> <
>> >>>
>> http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Single-Node_Cluster%29
>> >>>
>> >>>>
>> >>> Running_Hadoop_On_Ubuntu_Linux_(Multi-Node_Cluster)
>> >>> <
>> >>>
>> http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29
>> >>>
>> >>>>
>> >>> The tutorial on the hadoop wiki
>> >>> <http://hadoop.apache.org/common/docs/r0.20.0/mapred_tutorial.html> is
>> >>> too much for a beginner.
>> >>>
>> >>> Debugger:
>> >>> I do not think you can easily do debugging using remote debugger. This
>> >>> is natural since hadoop is not sequential programming, it would be very
>> >>> difficult to debug its apps.
>> >>> The only way to debug is to use traces.
>> >>>
>> >>> I think you can learn how to setup multi-node cluster, but for practice
>> >>> session you can use single node setup.
>> >>>
>> >>> Lets see what the experts say.
>> >>>
>> >>> Thanks and Regards
>> >>> Piyush Garg
>> >>>
>> >>>
>> >>> On Sunday 15 August 2010 09:07 AM, Rita Liu wrote:
>> >>>
>> >>>> Hi!
>> >>>>
>> >>>> I am a total beginner, but I am very interested in hadoop. I've
>> already
>> >>>> downloaded hadoop 0.19.2 and run on Ubuntu in single-node mode. Now I
>> >>>>
>> >>> want
>> >>>
>> >>>> to do two things:
>> >>>>
>> >>>> 1. Explore how hadoop works internally with one of the example
>> >>>>
>> >>> applications
>> >>>
>> >>>> hadoop provides
>> >>>> 2. Write an application on my own
>> >>>>
>> >>>> Those two things bring me following questions:
>> >>>>
>> >>>> a. debugger?
>> >>>> I am stuck since I don't know how to "explore" hadoop. I used to trace
>> >>>> through the code using a debugger, but in this case, I don't know if
>> >>>>
>> >>> there
>> >>>
>> >>>> is a good debugger to use; or -- maybe a debugger is not necessary for
>> >>>> hadoop? If not, then how do you trace through the code to either debug
>> or
>> >>>> just gain an understanding about the system? May I know what you,
>> >>>> experienced experts, do? :)
>> >>>>
>> >>>> b. Where to run hadoop?
>> >>>> Also -- may I know where you run your hadoop? Do you run on linux, or
>> on
>> >>>>
>> >>> VM
>> >>>
>> >>>> -- in particular, Cloudera? I heard that Cloudera is good for writing
>> >>>> mapreduce applications with hadoop itself as a blackbox; is it true?
>> If
>> >>>>
>> >>> my
>> >>>
>> >>>> ultimate goal is to understand how hadoop works internally, would it
>> be
>> >>>> better if I directly run it on linux?
>> >>>>
>> >>>> c. Single-node or multi-node?
>> >>>> In the beginning (just like my case :p) would it be better to use
>> >>>> single-node or multi-node? If the latter is true, should I obtain more
>> >>>> machines, or should I use more virtual machines to create more nodes?
>> >>>>
>> >>>> As a newbie, I am sorry for all those basic (and silly, I know :$)
>> >>>> questions. If possible, please help me out? Any suggestion or advice
>> will
>> >>>>
>> >>> be
>> >>>
>> >>>> greatly appreciated. Thank you very much!
>> >>>>
>> >>>> Best,
>> >>>> Rita :)
>> >>>>
>> >>>> P.S. If my questions are not suitable for this mailing-list, please
>> let
>> >>>>
>> >>> me
>> >>>
>> >>>> apologize, and then, could you please direct me to other
>> mailing-lists?
>> >>>> Sorry, and thanks a lot! :)
>> >>>>
>> >>>>
>> >>>>
>> >>>
>> >>
>> >
>>
>



-- 
Harsh J
www.harshj.com

Re: Hadoop basics

Reply via email to