[ 
https://issues.apache.org/jira/browse/GIRAPH-469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Reisman updated GIRAPH-469:
-------------------------------

    Attachment: GIRAPH-469-2.patch

This is the cleaned up patch, ready for review. Passes mvn verify under all the 
profiles. Sorry, I know its a big one, but it only does a couple things:

1. Refactors all the long methods in GraphMapper into easier to read, better 
documented methods.

2. Moves all the GraphMapper code that has to do with Giraph/BSP processes into 
a new GraphTaskManager. In this way GraphMapper becomes a simple wrapper to set 
up Hadoop-specific boilerplate while delegating all the work to our more 
platform neutral GraphTaskManager. This also allows GraphMapper to continue to 
inherit from Mapper

I am trying to set this part of the code up in steps to be processing-platform 
independent so I can implement a "pure YARN" mode a la GIRAPH-13. I tried not 
to do it all in one patch. As of now, the only absolutely direct pipeline from 
Hadoop into our Giraph workings at this point stem from the Mapper#Context 
which is still passed into the GiraphTaskManager from the GraphMapper.

Future JIRAs on this will include:

1. breaking out ZookeeperManager into an interface, and setting up a parallel 
impl that will spawn a YARN app container-hosted ZK instance.

2. Determining the extent of the things a replacement interface for 
Mapper#Context would have to do for Giraph, and replacing the Mapper#Context we 
get from GraphMapper (and Hadoop) with this interface so we can implement 
alternate implementations that let Giraph get what it needs from the underlying 
cluster without being Hadoop specific. You get the idea...

Thanks, I'll try to throw this up on ReviewBoard as well
                
> Cleanup GraphMapper
> -------------------
>
>                 Key: GIRAPH-469
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-469
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Nitay Joffe
>            Assignee: Eli Reisman
>         Attachments: GIRAPH-469-1-eli-idea.patch, GIRAPH-469-2.patch
>
>
> I don't see why we even call a map() method seeing as we are overriding 
> run(). We are clearly not particularly "mapreduce-y" so we should make it our 
> entry point more clear than a map(). Also I think we should have something 
> like a WorkerThread similar to MasterThread and clean up all of this to just 
> creare whichever threads the node is assigned roles of. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to