@Alessandro: I see now in bsp IO code that we really are just fooling the Mapper generic K,V params and we are handling our own input and output a bit more directly than I realized. Thanks for the help, I am thinking there may be less Hadoop is doing for us in the IO areas than I realized, which will make the YARN work much easier (I think). Thanks again for your input!
On Mon, Feb 11, 2013 at 10:49 AM, Eli Reisman <[email protected]>wrote: > Its great to hear that, because its what I'm torn about too: how do you > not duplicate Hadoop code, but keep Giraph's framework ties loosely coupled > as I add YARN code. I'm really trying to avoid a munge flag, but at this > point I'm getting stuck because the YARN setup code won't compile with all > of our Hadoop profiles anyway. So now I'm just trying to minimize the > number of "munge points" in the Giraph code. This will make the glue much > cleaner! > > In the end, it sounds like I will be able to avoid duplicating the input > split code since you have it done there. But it sounds like I must still > duplicate from hadoop the code that actually feeds the record readers and > commits output, since we have no Hadoop and no GraphMapper with In and Out > params to give it to us any more. Thats still less than I thought I would > have to duplicate/deal with. Yay! > > On Fri, Feb 8, 2013 at 3:06 PM, Alessandro Presta <[email protected]>wrote: > >> Hi Eli, >> >> Yes, GiraphFileInputFormat deals with input splitting in all cases. Note >> that most of the logic is the same as in current Hadoop, and we extend >> Hadoop's FileInputFormat. >> I wish there was a way to avoid any code duplication, but this is messing >> with implementation-specific code that is mostly private. >> >> Alessandro >> >> On 2/8/13 2:58 PM, "Eli Reisman" <[email protected]> wrote: >> >> >Hey (maybe @Alessandro, don't know...) I have been looking at the >> >GiraphFileInputFormat. Am I crazy, or with the advent of edge or vertex >> >based input files, do we now always generate our own input splits, from >> >scratch, without hadoop being involved? And if so, is this defaulted to >> >"on" no matter what, or only when we have dual edge-vertex input >> >information to process? If so, its one less thing I will have to >> implement >> >for the YARN implementation. >> > >> >Thanks, looking forward to hearing back, >> > >> >Eli >> >> >
