[jira] [Commented] (GIRAPH-13) Port Giraph to YARN

Eli Reisman (JIRA) Tue, 19 Feb 2013 09:13:14 -0800

    [ 
https://issues.apache.org/jira/browse/GIRAPH-13?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13581447#comment-13581447
 ]


Eli Reisman commented on GIRAPH-13:
-----------------------------------

Thanks for your great review Hyunsik, great to hear from you!

I really appreciate your input! You successfully named ALL of my concerns! My 
biggest is the IO formats which, as you said, are completely depended on MRv1. 
Your idea was exactly the approach I was planning on.

As for your 1. concern, yes this is a draft version and the new one (don't even 
have a patch up yet but I will soon to show you) will be completely 
configurable from the GiraphRunner CLI options.

for 2. concern: There is a need for history and a number of other basic systems 
we get from MRv1 right now. Because of the timing (I am trying to finish this 
phase before the end of march) I may attmept to make GIRAPH-13 just cover the 
following upgrade: a YARN profile for Giraph, including the ability to run 
examples/ applications from the Giraph jar-with-dependencies, on YARN. I hope 
to make all other "fleshing out" of the features in more separate JIRAs or 
subissues. This sort of bounds in the difficulty for this first stage, and 
enables others to start working the feature-add JIRA's without having to know 
all about YARN.

The exciting thing is that the YARN API allows a much finer grained control of 
a lot of our BSP process than Hadoop ever did. And I too was thinking, after 
this a port to Mesos (or wherever) is going to be really easy! We might as time 
passes consider moving the launch of our zookeeper instance into the 
ApplicationMaster, doing more fine-grained resource allocation control (assign 
input splits right at the beginning of the job run, assign hosts to the workers 
as we choose for data locality, allot memory and/or cores depending on the size 
of the splits we assign etc.) the options really open some doors.

BUT, even to just make the exmaples run, the IO problem must be solved. I do 
think wrapping the MRv1 related functions (stuff that needs a 
TaskAttemptContext or Job-type classes from Hadoop and more) is the way to go, 
but I sure appreciate any ideas you might have?

Anyway, I will put up another patch hopefully tonight or tomorrow that is 
another significant upgrade from what you saw here so far. All input and ideas 
are appreciated, thanks again!

                
> Port Giraph to YARN
> -------------------
>
>                 Key: GIRAPH-13
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-13
>             Project: Giraph
>          Issue Type: New Feature
>            Reporter: Jakob Homan
>            Assignee: Eli Reisman
>         Attachments: GIRAPH-13-1.patch, GIRAPH-13-2.patch
>
>
> Now that YARN (aka MR2 aka MAPREDUCE-279) has been merged into the Hadoop 
> trunk, we should think about what it would take to separate out the graph 
> processing bits of Giraph from the MR1-specific code so as to take advantage 
> of the less-MR centric aspects of YARN, while still supporting both over the 
> medium term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-13) Port Giraph to YARN

Reply via email to