[
https://issues.apache.org/jira/browse/GIRAPH-13?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13581447#comment-13581447
]
Eli Reisman commented on GIRAPH-13:
-----------------------------------
Thanks for your great review Hyunsik, great to hear from you!
I really appreciate your input! You successfully named ALL of my concerns! My
biggest is the IO formats which, as you said, are completely depended on MRv1.
Your idea was exactly the approach I was planning on.
As for your 1. concern, yes this is a draft version and the new one (don't even
have a patch up yet but I will soon to show you) will be completely
configurable from the GiraphRunner CLI options.
for 2. concern: There is a need for history and a number of other basic systems
we get from MRv1 right now. Because of the timing (I am trying to finish this
phase before the end of march) I may attmept to make GIRAPH-13 just cover the
following upgrade: a YARN profile for Giraph, including the ability to run
examples/ applications from the Giraph jar-with-dependencies, on YARN. I hope
to make all other "fleshing out" of the features in more separate JIRAs or
subissues. This sort of bounds in the difficulty for this first stage, and
enables others to start working the feature-add JIRA's without having to know
all about YARN.
The exciting thing is that the YARN API allows a much finer grained control of
a lot of our BSP process than Hadoop ever did. And I too was thinking, after
this a port to Mesos (or wherever) is going to be really easy! We might as time
passes consider moving the launch of our zookeeper instance into the
ApplicationMaster, doing more fine-grained resource allocation control (assign
input splits right at the beginning of the job run, assign hosts to the workers
as we choose for data locality, allot memory and/or cores depending on the size
of the splits we assign etc.) the options really open some doors.
BUT, even to just make the exmaples run, the IO problem must be solved. I do
think wrapping the MRv1 related functions (stuff that needs a
TaskAttemptContext or Job-type classes from Hadoop and more) is the way to go,
but I sure appreciate any ideas you might have?
Anyway, I will put up another patch hopefully tonight or tomorrow that is
another significant upgrade from what you saw here so far. All input and ideas
are appreciated, thanks again!
> Port Giraph to YARN
> -------------------
>
> Key: GIRAPH-13
> URL: https://issues.apache.org/jira/browse/GIRAPH-13
> Project: Giraph
> Issue Type: New Feature
> Reporter: Jakob Homan
> Assignee: Eli Reisman
> Attachments: GIRAPH-13-1.patch, GIRAPH-13-2.patch
>
>
> Now that YARN (aka MR2 aka MAPREDUCE-279) has been merged into the Hadoop
> trunk, we should think about what it would take to separate out the graph
> processing bits of Giraph from the MR1-specific code so as to take advantage
> of the less-MR centric aspects of YARN, while still supporting both over the
> medium term.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira