Re: Hadoop basics

amit kumar verma Sun, 15 Aug 2010 23:28:26 -0700

 Hi Rita,

If you reached a place where you need to use api like hahoop, forgetabout the debugging the code. Your code must be syntactically andlogically error free, for rest of the things logging is enough. Trylog4j only.


Thanks,
Amit Kumar Verma
Verchaska Infotech Pvt. Ltd.



On 08/15/2010 11:10 AM, Rita Liu wrote:

Hi Harsh and Piyush! Thank you very much. So it seems like it would be best
if I use log4j to trace, and debugging with a debugger is still possible if
I set "mapred.job.tracker" to be "local" and "fs.default.name" to be
"local", in hadoop-site.xml. Plus: in hadoop-env.sh, I should specify
HADOOP_OPTS to be:

"-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=8000" (why
8000? also, what does "-agentlib:jdwp=transport=dt_socket" mean?)

... in order to use a debugger. Is my understanding correct? :)

If so -- then which debugger do you use? May I know? Thanks a lot! I am also
going to try log4j now!

Many thanks,
-Rita :))

On Sat, Aug 14, 2010 at 10:22 PM, Piyush Garg<[email protected]>wrote:

Hi Smith,

step debugging also works in hadoop as with other java applications.
export

HADOOP_OPTS="-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=8000"
'suspend=y' is to let the jvm suspend until the remote debugger is
attached.

Thanks and Regards
Piyush Garg


On Sunday 15 August 2010 10:39 AM, smith jack wrote:

that means you can only trace by log,
and not possible to debug hadoop using step debug, haha
distributed system always introduce extra complexity and confusing

issues.

2010/8/15 Piyush Garg<[email protected]>:

Hi Rita,

You can put log4j logger debug statements in the code. log4j library is
part of hadoop framework and there is already a log4j.properties file in
hadoop conf directory and all the output logs are saved in hadoop logs
directory.

Thanks and Regards
Piyush Garg


On Sunday 15 August 2010 10:20 AM, Rita Liu wrote:

Thank you very much, Piyush! :) May I know more about how to use

"traces"?

And -- yes, please teach me if possible, experts! :)

Thanks a lot,
-Rita :))

On Sat, Aug 14, 2010 at 9:42 PM, Piyush Garg<[email protected]>

wrote:

Hi Rita,

I have just started to learn hadoop as well, I know there is a long

way

to go.
I found some useful links which I am sharing with you.

Hadoop Tutorial - YDN
<http://developer.yahoo.com/hadoop/tutorial/index.html>  excellent
beginners tutorial and well organized.
Running Hadoop On Ubuntu Linux (Single-Node Cluster) - Michael G. Noll
<

http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Single-Node_Cluster%29


Running_Hadoop_On_Ubuntu_Linux_(Multi-Node_Cluster)
<

http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29


The tutorial on the hadoop wiki
<http://hadoop.apache.org/common/docs/r0.20.0/mapred_tutorial.html>

is

too much for a beginner.

Debugger:
I do not think you can easily do debugging using remote debugger. This
is natural since hadoop is not sequential programming, it would be

very

difficult to debug its apps.
The only way to debug is to use traces.

I think you can learn how to setup multi-node cluster, but for

practice

session you can use single node setup.

Lets see what the experts say.

Thanks and Regards
Piyush Garg


On Sunday 15 August 2010 09:07 AM, Rita Liu wrote:

Hi!

I am a total beginner, but I am very interested in hadoop. I've

already

downloaded hadoop 0.19.2 and run on Ubuntu in single-node mode. Now I

want

to do two things:

1. Explore how hadoop works internally with one of the example

applications

hadoop provides
2. Write an application on my own

Those two things bring me following questions:

a. debugger?
I am stuck since I don't know how to "explore" hadoop. I used to

trace

through the code using a debugger, but in this case, I don't know if

there

is a good debugger to use; or -- maybe a debugger is not necessary

for

hadoop? If not, then how do you trace through the code to either

debug or

just gain an understanding about the system? May I know what you,
experienced experts, do? :)

b. Where to run hadoop?
Also -- may I know where you run your hadoop? Do you run on linux, or

on

VM

-- in particular, Cloudera? I heard that Cloudera is good for writing
mapreduce applications with hadoop itself as a blackbox; is it true?

If

my

ultimate goal is to understand how hadoop works internally, would it

be

better if I directly run it on linux?

c. Single-node or multi-node?
In the beginning (just like my case :p) would it be better to use
single-node or multi-node? If the latter is true, should I obtain

more

machines, or should I use more virtual machines to create more nodes?

As a newbie, I am sorry for all those basic (and silly, I know :$)
questions. If possible, please help me out? Any suggestion or advice

will

be

greatly appreciated. Thank you very much!

Best,
Rita :)

P.S. If my questions are not suitable for this mailing-list, please

let

me

apologize, and then, could you please direct me to other

mailing-lists?

Sorry, and thanks a lot! :)

Re: Hadoop basics

Reply via email to