Hi Rita,
If you reached a place where you need to use api like hahoop, forget
about the debugging the code. Your code must be syntactically and
logically error free, for rest of the things logging is enough. Try
log4j only.
Thanks,
Amit Kumar Verma
Verchaska Infotech Pvt. Ltd.
On 08/15/2010 11:10 AM, Rita Liu wrote:
Hi Harsh and Piyush! Thank you very much. So it seems like it would be best
if I use log4j to trace, and debugging with a debugger is still possible if
I set "mapred.job.tracker" to be "local" and "fs.default.name" to be
"local", in hadoop-site.xml. Plus: in hadoop-env.sh, I should specify
HADOOP_OPTS to be:
"-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=8000" (why
8000? also, what does "-agentlib:jdwp=transport=dt_socket" mean?)
... in order to use a debugger. Is my understanding correct? :)
If so -- then which debugger do you use? May I know? Thanks a lot! I am also
going to try log4j now!
Many thanks,
-Rita :))
On Sat, Aug 14, 2010 at 10:22 PM, Piyush Garg<[email protected]>wrote:
Hi Smith,
step debugging also works in hadoop as with other java applications.
export
HADOOP_OPTS="-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=8000"
'suspend=y' is to let the jvm suspend until the remote debugger is
attached.
Thanks and Regards
Piyush Garg
On Sunday 15 August 2010 10:39 AM, smith jack wrote:
that means you can only trace by log,
and not possible to debug hadoop using step debug, haha
distributed system always introduce extra complexity and confusing
issues.
2010/8/15 Piyush Garg<[email protected]>:
Hi Rita,
You can put log4j logger debug statements in the code. log4j library is
part of hadoop framework and there is already a log4j.properties file in
hadoop conf directory and all the output logs are saved in hadoop logs
directory.
Thanks and Regards
Piyush Garg
On Sunday 15 August 2010 10:20 AM, Rita Liu wrote:
Thank you very much, Piyush! :) May I know more about how to use
"traces"?
And -- yes, please teach me if possible, experts! :)
Thanks a lot,
-Rita :))
On Sat, Aug 14, 2010 at 9:42 PM, Piyush Garg<[email protected]>
wrote:
Hi Rita,
I have just started to learn hadoop as well, I know there is a long
way
to go.
I found some useful links which I am sharing with you.
Hadoop Tutorial - YDN
<http://developer.yahoo.com/hadoop/tutorial/index.html> excellent
beginners tutorial and well organized.
Running Hadoop On Ubuntu Linux (Single-Node Cluster) - Michael G. Noll
<
http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Single-Node_Cluster%29
Running_Hadoop_On_Ubuntu_Linux_(Multi-Node_Cluster)
<
http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29
The tutorial on the hadoop wiki
<http://hadoop.apache.org/common/docs/r0.20.0/mapred_tutorial.html>
is
too much for a beginner.
Debugger:
I do not think you can easily do debugging using remote debugger. This
is natural since hadoop is not sequential programming, it would be
very
difficult to debug its apps.
The only way to debug is to use traces.
I think you can learn how to setup multi-node cluster, but for
practice
session you can use single node setup.
Lets see what the experts say.
Thanks and Regards
Piyush Garg
On Sunday 15 August 2010 09:07 AM, Rita Liu wrote:
Hi!
I am a total beginner, but I am very interested in hadoop. I've
already
downloaded hadoop 0.19.2 and run on Ubuntu in single-node mode. Now I
want
to do two things:
1. Explore how hadoop works internally with one of the example
applications
hadoop provides
2. Write an application on my own
Those two things bring me following questions:
a. debugger?
I am stuck since I don't know how to "explore" hadoop. I used to
trace
through the code using a debugger, but in this case, I don't know if
there
is a good debugger to use; or -- maybe a debugger is not necessary
for
hadoop? If not, then how do you trace through the code to either
debug or
just gain an understanding about the system? May I know what you,
experienced experts, do? :)
b. Where to run hadoop?
Also -- may I know where you run your hadoop? Do you run on linux, or
on
VM
-- in particular, Cloudera? I heard that Cloudera is good for writing
mapreduce applications with hadoop itself as a blackbox; is it true?
If
my
ultimate goal is to understand how hadoop works internally, would it
be
better if I directly run it on linux?
c. Single-node or multi-node?
In the beginning (just like my case :p) would it be better to use
single-node or multi-node? If the latter is true, should I obtain
more
machines, or should I use more virtual machines to create more nodes?
As a newbie, I am sorry for all those basic (and silly, I know :$)
questions. If possible, please help me out? Any suggestion or advice
will
be
greatly appreciated. Thank you very much!
Best,
Rita :)
P.S. If my questions are not suitable for this mailing-list, please
let
me
apologize, and then, could you please direct me to other
mailing-lists?
Sorry, and thanks a lot! :)