Re: GiraphFileInputFormat questions

Eli Reisman Mon, 11 Feb 2013 16:15:56 -0800

@Alessandro: I see now in bsp IO code that we really are just fooling the
Mapper generic K,V params and we are handling our own input and output a
bit more directly than I realized. Thanks for the help, I am thinking there
may be less Hadoop is doing for us in the IO areas than I realized, which
will make the YARN work much easier (I think). Thanks again for your input!


On Mon, Feb 11, 2013 at 10:49 AM, Eli Reisman <[email protected]>wrote:

> Its great to hear that, because its what I'm torn about too: how do you
> not duplicate Hadoop code, but keep Giraph's framework ties loosely coupled
> as I add YARN code. I'm really trying to avoid a munge flag, but at this
> point I'm getting stuck because the YARN setup code won't compile with all
> of our Hadoop profiles anyway. So now I'm just trying to minimize the
> number of "munge points" in the Giraph code. This will make the glue much
> cleaner!
>
> In the end, it sounds like I will be able to avoid duplicating the input
> split code since you have it done there. But it sounds like I must still
> duplicate from hadoop the code that actually feeds the record readers and
> commits output, since we have no Hadoop and no GraphMapper with In and Out
> params to give it to us any more. Thats still less than I thought I would
> have to duplicate/deal with. Yay!
>
> On Fri, Feb 8, 2013 at 3:06 PM, Alessandro Presta <[email protected]>wrote:
>
>> Hi Eli,
>>
>> Yes, GiraphFileInputFormat deals with input splitting in all cases. Note
>> that most of the logic is the same as in current Hadoop, and we extend
>> Hadoop's FileInputFormat.
>> I wish there was a way to avoid any code duplication, but this is messing
>> with implementation-specific code that is mostly private.
>>
>> Alessandro
>>
>> On 2/8/13 2:58 PM, "Eli Reisman" <[email protected]> wrote:
>>
>> >Hey (maybe @Alessandro, don't know...) I have been looking at the
>> >GiraphFileInputFormat. Am I crazy, or with the advent of edge or vertex
>> >based input files, do we now always generate our own input splits, from
>> >scratch, without hadoop being involved? And if so, is this defaulted to
>> >"on" no matter what, or only when we have dual edge-vertex input
>> >information to process? If so, its one less thing I will have to
>> implement
>> >for the YARN implementation.
>> >
>> >Thanks, looking forward to hearing back,
>> >
>> >Eli
>>
>>
>

Re: GiraphFileInputFormat questions

Reply via email to