[jira] [Commented] (GIRAPH-13) Port Giraph to YARN

Eli Reisman (JIRA) Tue, 19 Feb 2013 09:39:15 -0800

    [ 
https://issues.apache.org/jira/browse/GIRAPH-13?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13581473#comment-13581473
 ]


Eli Reisman commented on GIRAPH-13:
-----------------------------------

Hey one more idea to throw out there regarding all the IO format issues with 
YARN, what do you think of this:

Since some of our internals are prettty bound up in some MRv1 classes, we can 
do the refactor and wrapping already spoken about above to hide this 
dependency. Another approach I might explore is to simply have a generic task 
runner (that owns GraphTaskManager, and replaces GraphMapper in our YARN impl) 
that just instantiates the TaskAttemptContext and other Hadoop MRv1 classes and 
populates them with the info they need to run the job (taken from the 
giraphConfiguration and/or the YARN classes that report some of the same data 
to the running job) and just hand those off to our Giraph code that expects 
these objects. Since this activity is self-contained in the runner class, no 
platform-dependent setup code (for YARN, mesos, whoever) has to know anything 
about the runner, just create it and hand it the data it needs, set it to 
running on the right compute nodes, etc.

This is a tiny bit hacky, but gets the job done with minimal changes to 
existing code, allows for future JIRAs to do more extensive refactors, and does 
not hide from the fact that we will still carry dependencies on the Hadoop JARs 
for as long as we support MRv1 too, so we will have access to these classes to 
instantiate even on Mesos or YARN. I am not entirely sure this approach is 
possible but its one I have toyed with as an alternative to doing the full 
"wrap all MRv1 IO objects" approach.

Any opinions? I will be exploring the options for the IO dilemma in great 
detail later in the week and will post my findings/opinions as I survey the 
landscape. Just need to get the rest of the Yarn job setup code done today and 
post that patch first...


                
> Port Giraph to YARN
> -------------------
>
>                 Key: GIRAPH-13
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-13
>             Project: Giraph
>          Issue Type: New Feature
>            Reporter: Jakob Homan
>            Assignee: Eli Reisman
>         Attachments: GIRAPH-13-1.patch, GIRAPH-13-2.patch
>
>
> Now that YARN (aka MR2 aka MAPREDUCE-279) has been merged into the Hadoop 
> trunk, we should think about what it would take to separate out the graph 
> processing bits of Giraph from the MR1-specific code so as to take advantage 
> of the less-MR centric aspects of YARN, while still supporting both over the 
> medium term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-13) Port Giraph to YARN

Reply via email to