Re: Hadoop basics

Piyush Garg Sat, 14 Aug 2010 22:23:23 -0700

Hi Smith,

step debugging also works in hadoop as with other java applications.
export
HADOOP_OPTS="-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=8000"
'suspend=y' is to let the jvm suspend until the remote debugger is attached.


Thanks and Regards
Piyush Garg


On Sunday 15 August 2010 10:39 AM, smith jack wrote:
> that means you can only trace by log,
> and not possible to debug hadoop using step debug, haha
> distributed system always introduce extra complexity and confusing issues.
>
> 2010/8/15 Piyush Garg <[email protected]>:
>   
>> Hi Rita,
>>
>> You can put log4j logger debug statements in the code. log4j library is
>> part of hadoop framework and there is already a log4j.properties file in
>> hadoop conf directory and all the output logs are saved in hadoop logs
>> directory.
>>
>> Thanks and Regards
>> Piyush Garg
>>
>>
>> On Sunday 15 August 2010 10:20 AM, Rita Liu wrote:
>>     
>>> Thank you very much, Piyush! :) May I know more about how to use "traces"?
>>>
>>> And -- yes, please teach me if possible, experts! :)
>>>
>>> Thanks a lot,
>>> -Rita :))
>>>
>>> On Sat, Aug 14, 2010 at 9:42 PM, Piyush Garg <[email protected]> wrote:
>>>
>>>
>>>       
>>>> Hi Rita,
>>>>
>>>> I have just started to learn hadoop as well, I know there is a long way
>>>> to go.
>>>> I found some useful links which I am sharing with you.
>>>>
>>>> Hadoop Tutorial - YDN
>>>> <http://developer.yahoo.com/hadoop/tutorial/index.html> excellent
>>>> beginners tutorial and well organized.
>>>> Running Hadoop On Ubuntu Linux (Single-Node Cluster) - Michael G. Noll
>>>> <
>>>> http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Single-Node_Cluster%29
>>>>
>>>>         
>>>>>           
>>>> Running_Hadoop_On_Ubuntu_Linux_(Multi-Node_Cluster)
>>>> <
>>>> http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29
>>>>
>>>>         
>>>>>           
>>>> The tutorial on the hadoop wiki
>>>> <http://hadoop.apache.org/common/docs/r0.20.0/mapred_tutorial.html> is
>>>> too much for a beginner.
>>>>
>>>> Debugger:
>>>> I do not think you can easily do debugging using remote debugger. This
>>>> is natural since hadoop is not sequential programming, it would be very
>>>> difficult to debug its apps.
>>>> The only way to debug is to use traces.
>>>>
>>>> I think you can learn how to setup multi-node cluster, but for practice
>>>> session you can use single node setup.
>>>>
>>>> Lets see what the experts say.
>>>>
>>>> Thanks and Regards
>>>> Piyush Garg
>>>>
>>>>
>>>> On Sunday 15 August 2010 09:07 AM, Rita Liu wrote:
>>>>
>>>>         
>>>>> Hi!
>>>>>
>>>>> I am a total beginner, but I am very interested in hadoop. I've already
>>>>> downloaded hadoop 0.19.2 and run on Ubuntu in single-node mode. Now I
>>>>>
>>>>>           
>>>> want
>>>>
>>>>         
>>>>> to do two things:
>>>>>
>>>>> 1. Explore how hadoop works internally with one of the example
>>>>>
>>>>>           
>>>> applications
>>>>
>>>>         
>>>>> hadoop provides
>>>>> 2. Write an application on my own
>>>>>
>>>>> Those two things bring me following questions:
>>>>>
>>>>> a. debugger?
>>>>> I am stuck since I don't know how to "explore" hadoop. I used to trace
>>>>> through the code using a debugger, but in this case, I don't know if
>>>>>
>>>>>           
>>>> there
>>>>
>>>>         
>>>>> is a good debugger to use; or -- maybe a debugger is not necessary for
>>>>> hadoop? If not, then how do you trace through the code to either debug or
>>>>> just gain an understanding about the system? May I know what you,
>>>>> experienced experts, do? :)
>>>>>
>>>>> b. Where to run hadoop?
>>>>> Also -- may I know where you run your hadoop? Do you run on linux, or on
>>>>>
>>>>>           
>>>> VM
>>>>
>>>>         
>>>>> -- in particular, Cloudera? I heard that Cloudera is good for writing
>>>>> mapreduce applications with hadoop itself as a blackbox; is it true? If
>>>>>
>>>>>           
>>>> my
>>>>
>>>>         
>>>>> ultimate goal is to understand how hadoop works internally, would it be
>>>>> better if I directly run it on linux?
>>>>>
>>>>> c. Single-node or multi-node?
>>>>> In the beginning (just like my case :p) would it be better to use
>>>>> single-node or multi-node? If the latter is true, should I obtain more
>>>>> machines, or should I use more virtual machines to create more nodes?
>>>>>
>>>>> As a newbie, I am sorry for all those basic (and silly, I know :$)
>>>>> questions. If possible, please help me out? Any suggestion or advice will
>>>>>
>>>>>           
>>>> be
>>>>
>>>>         
>>>>> greatly appreciated. Thank you very much!
>>>>>
>>>>> Best,
>>>>> Rita :)
>>>>>
>>>>> P.S. If my questions are not suitable for this mailing-list, please let
>>>>>
>>>>>           
>>>> me
>>>>
>>>>         
>>>>> apologize, and then, could you please direct me to other mailing-lists?
>>>>> Sorry, and thanks a lot! :)
>>>>>
>>>>>
>>>>>
>>>>>           
>>>>         
>>>       
>>

Re: Hadoop basics

Reply via email to