Re: [jira] [Commented] (GIRAPH-13) Port Giraph to YARN

Eli Reisman Mon, 04 Mar 2013 14:26:59 -0800

oops accidental "send". anyway, you can divy up the memory into any number
of workers you like for a local or cluster job, its all the same. There are
a few little issues in the currently-available patch that I have fixed, but
I am holding off on the next patch until I have the YARN experts here take
a peek to see what should be happening with the tasks telling the app
master they are done (success or fail) and the app master finalizing the
job.  All this will be done soon.

The last big decision to make is around back compatibility. Both the Client
and AppMaster API's have changed all through the 2.0.x-alpha line. Right
now I htink I will support a default of 2.0.2-alpha (the current patch is
set to 2.0.3) and will also suporrt 2.0.3, 2.0.4, and trunk (3.0 snapshot)
-- I can backport the GiraphYarnClient to the old Client API to support
2.0.1 and 2.0.0 in a future JIRA. But I was told to use the newer one and
the old one was havign some trouble even on 2.0.2 which I definitely want
to support. So we can start there if thats OK with everyone? Let me know.

This will run using the hadoop_yarn profile, and including the per-task
heap size as -yh and the comma-separated jar name list (no paths) in -yj
and otherwise all the same as normal. I will put up something on the wiki
when its done to give a quick overview, and some more JIRAs.

For one thing, we will have a lot of our in-flight data sent right to the
logs until we have our own WebUI for the App Master to support.

I will also suggest a future JIRA to break up the o.a.g.yarn package files
into subpackages mirroring the giraph-core o.a.g top-level dir. This way,
it only compiles if its under the o.a.g.yarn parent package, but we can add
code to replace various functionalities in MRv1 Giraph and just stitch them
conditionally into the BSP code without munge flags in the source code
(just the imports sometimes.) this would allow us to start doing a bunch of
stuff that MRv1 won't let us do, and that might be really powerful for
Giraph, without breaking MRv1. At this point I'm not even sure factoring
out Context stuff from Giraph would do anything but make our IO formats
even more of a pain.

Best of all, I think the approach I'm using to set up the job in
GiraphYarnTask could work to fool other cluster frameworks into running
GIraph (like Mesos) without ever internally having to break our MR IO
formats. Or not until we want to start stepping back from Hadoop.

And thanks for mentioning abotu the example of YARN, thats what I want. I
would like this to be a go-to example of getting a "real application" to
run on top of pure YARN for others who want to move to YARN. I think HW
will like that too. ;)

On Mon, Mar 4, 2013 at 2:15 PM, Eli Reisman <[email protected]>wrote:

> I think I will use Yarn MiniCluster to verify my AM comes up, and maybe
> some no-op 1 node job. I will not test any BSP code as that is all running
> against the MRv1 interface still (I didn't have to change or munge anything
> but GiraphRunner!) so those tests verify that BSP works.
>
> The cool thing about our implementation is that it will run on a local
> YARN setup on one machine with any division of labor and heap you feel
> like. So if the cluster has 6GB available, you can run Giraph on a bit less
> than that (each app uses some heap for YARN management stuff) as 5 1GB
> nodes (1 master 4 workers) or 2 2GB (1
>
>
> On Mon, Mar 4, 2013 at 9:35 AM, Hyunsik Choi (JIRA) <[email protected]>wrote:
>
>>
>>     [
>> https://issues.apache.org/jira/browse/GIRAPH-13?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13592389#comment-13592389]
>>
>> Hyunsik Choi commented on GIRAPH-13:
>> ------------------------------------
>>
>> Eli,
>>
>> This patch looks like a well-written reference code. Actually, I have
>> learned good usage of Yarn from your patch. I'm looking forward to the
>> complete work.
>>
>> How about the plan for unit test? Are you planning to use MiniYarnCluster
>> for integration test?
>>
>> > Port Giraph to YARN
>> > -------------------
>> >
>> >                 Key: GIRAPH-13
>> >                 URL: https://issues.apache.org/jira/browse/GIRAPH-13
>> >             Project: Giraph
>> >          Issue Type: New Feature
>> >            Reporter: Jakob Homan
>> >            Assignee: Eli Reisman
>> >         Attachments: GIRAPH-13-1.patch, GIRAPH-13-2.patch,
>> GIRAPH-13-3.patch, GIRAPH-13-4.patch, GIRAPH-13-5.patch, GIRAPH-13-6.patch,
>> GIRAPH-13-7.patch, GIRAPH-13-8.patch
>> >
>> >
>> > Now that YARN (aka MR2 aka MAPREDUCE-279) has been merged into the
>> Hadoop trunk, we should think about what it would take to separate out the
>> graph processing bits of Giraph from the MR1-specific code so as to take
>> advantage of the less-MR centric aspects of YARN, while still supporting
>> both over the medium term.
>>
>> --
>> This message is automatically generated by JIRA.
>> If you think it was sent incorrectly, please contact your JIRA
>> administrators
>> For more information on JIRA, see: http://www.atlassian.com/software/jira
>>
>
>

Re: [jira] [Commented] (GIRAPH-13) Port Giraph to YARN

Reply via email to