Re: Absolute Newbie

Thomas Jungblut Fri, 11 Nov 2011 06:01:50 -0800

Hey,

thanks for your interest, it is currently a bit chaotic and not well
documented, but that's open source ;))
I answer your questions one by one.


1.  I am an absolute newbie to Hama and Hadoop. Should I learn hadoop
>   first before I can begin contributing to this project?


We officially just use HDFS, so it is enough if you're familiar with the
FileSystem API. [1]
This includes that you are familiar with the Writable interface[2], which
lets you serialize and deserialize objects.

  2. I don't exactly understand how hama works and what it is. All I
>   understand is that it's a graph library written over a distributed
>   architecture hadoop.
>   Where can get to know the basics of the hama, as I have already stated
>   before that do I also need to learn hadoop?


It is not nessacarely a graph library, we are a BSP (Bulk Synchronous
Parallel) Framework. You can familiarize with BSP by reading the wikipedia
article [3]
However, you can solve graph problems with it as well as matrix operations
or other fancy stuff like real time processing streams.
Like in the last question, you don't need to understand MapReduce (I guess
that's what you mean by Hadoop in this case) to understand BSP, but once
you understand BSP, you will understand MapReduce. Hope you get the
directions ;)

  3. On the getting started page, instructions are given with Maven and
>   SVN.
>   I have experience with git and not these. I found that the mirror github
>   repository and have forked it and would be working through it only. Is it
>   OK?


We work with patches (which are unified diffs), this will also work with
git. Sadly you can't skip maven, this will be a must-have.
If you are targetting to be a long-term committer, no matter what project
at Apache, you will have to know how to use SVN.
Git is only a read-only repository and will be constantly mirrored from SVN.
SVN is really easy, in my opinion easier than git, so this won't be a
problem.

  4. To begin with I found this issue for newbies. HAMA-469.
>   https://issues.apache.org/jira/browse/HAMA-469
>   It says that statusUpdate() method should be called finally. So what I
>   can see that there is a
>   umbilical.statusUpdate(taskId, currentTaskStatus);
>   I will put it in finally block. I dont understand what this piece of
>   code wants to do. Basically what i have understood there is a cyclic
>   barrier kind of thing so as to create a rendezvous for many threads. Some
>   messages are combined and the function returns. I am still lost at
>   understanding the codebase.


Great you've already found our issue tracker and the newbie issues.
Sadly the description does not cover everything, e.G. motivation and stuff.
A quick explanation is: If failure in the sync method occured, we want to
update the "umbilical", so that it knows that the sync has failed.
Adding a finally block is not the right way, you should take a look at the
catch clause.
There is only a error log, but we want to make the status update in this
clause and make the process fail = throw a runtime exception.

Once reading the wikipedia article I hope you know what the sync method
should do (send messages!).
This isn't the whole story yet, but I think you can explore for yourself by
debugging a bit.

  5. I also found that I can apply to apache for a mentor. Here is my
>   skillset [0] and I wish to become a long term contributor to projects
>   centred around hadoop.
>   [0] http://in.linkedin.com/in/apurv5
>   I am really looking forward to becoming a full fledged contributor in a
>   span of six months.


Nice CV, but it is enough if you can code in Java and are creative in
finding solutions. And actually making them run as well.
I'm not sure if I can mentor you, but I guess we are all able to help you
once you'll facing a problem.
Just ask on the mailing list or mail me directly ;)

Hope I clarified a few things. Looking forward to hear from you!

Thomas

[1]
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/fs/FileSystem.html

[2]
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/Writable.html
[3] http://en.wikipedia.org/wiki/Bulk_synchronous_parallel

2011/11/11 Apurv Verma <[email protected]>

> Hii all,
>
>
>   1.  I am an absolute newbie to Hama and Hadoop. Should I learn hadoop
>   first before I can begin contributing to this project?
>
>   2. I don't exactly understand how hama works and what it is. All I
>   understand is that it's a graph library written over a distributed
>   architecture hadoop.
>   Where can get to know the basics of the hama, as I have already stated
>   before that do I also need to learn hadoop?
>
>   3. On the getting started page, instructions are given with Maven and
>   SVN.
>   I have experience with git and not these. I found that the mirror github
>   repository and have forked it and would be working through it only. Is it
>   OK?
>
>   4. To begin with I found this issue for newbies. HAMA-469.
>   https://issues.apache.org/jira/browse/HAMA-469
>   It says that statusUpdate() method should be called finally. So what I
>   can see that there is a
>
>   umbilical.statusUpdate(taskId, currentTaskStatus);
>   I will put it in finally block. I dont understand what this piece of
>   code wants to do. Basically what i have understood there is a cyclic
>   barrier kind of thing so as to create a rendezvous for many threads. Some
>   messages are combined and the function returns. I am still lost at
>   understanding the codebase.
>
>
>   5. I also found that I can apply to apache for a mentor. Here is my
>   skillset [0] and I wish to become a long term contributor to projects
>   centred around hadoop.
>   [0] http://in.linkedin.com/in/apurv5
>   I am really looking forward to becoming a full fledged contributor in a
>   span of six months.
>
>
>
> --
> thanks and regards,
>
> Apurv Verma
> B. Tech.(CSE)
> IIT- Ropar
>



-- 
Thomas Jungblut
Berlin <[email protected]>

Re: Absolute Newbie

Reply via email to