Re: Review Request: Out-of-core messages

Maja Kabiljo Mon, 06 Aug 2012 11:26:20 -0700

I've been investigating more the case in which we run out of memory even
if we use out-of-core messages, and here is what I've discovered. I count
how many messages have the worker sent out by increasing the number in the
moment of SendPartitionMessagesRequest.write(), and also count how many
messages have it received in the moment of
SendPartitionMessagesRequest.readFields(). Even on the smaller examples,
we can see that those two numbers differ significantly during the super
step (and in the end of super step they are about the same of course).

In one of the examples: I run RandomMessageBenchmark with 50k vertices,
100 edges per vertex, 100 messages per edge, message size 0.5KB, 2
workers, 16GB per worker. Before the crash, I can see that the number of
messages to be sent is around 28M, and the number of received messages
only 8M. I don't know the details of how Netty is implemented, but I
suppose that it has to keep sent messages until the moment it receives the
confirmation from the destination, and that's the reason why we run out of
memory. I was able to fix this by adding occasional
nettyClient.waitAllRequests() call, and got the problem of this size to
finish successfully! Adding these calls does slow down the algorithm a
bit, so I'll create a patch which adds this as an option, and not using it
as default. After that, by tweaking the out-of-core and these parameters,
we should be able to run jobs with any amount of messages data.

On 8/4/12 12:16 AM, "Eli Reisman" <[email protected]> wrote:

>I like the idea of keeping the messages out of the vertices there is a lot
>of unneeded data copying/GC going on and if this eliminates some that
>would
>be fantastic and I think a big help through the whole job run, memory
>wise.
>
>On Fri, Aug 3, 2012 at 4:03 AM, Gianmarco De Francisci Morales <
>[email protected]> wrote:
>
>> Hi,
>>
>> >Are you saying that out-of-core is faster that hitting memory
>>boundaries
>> > >(i.e. GC)?  It is a bit tough to imagine that out-of-core beats
>>in-core
>> > >=).
>> >
>> > That's the only explanation I could think of, honestly it sounds
>>wrong to
>> > me too. But those are the results I keep getting. If someone has a
>>better
>> > one I'd love to hear it :-)
>>
>>
>> I am not surprised.
>> Streaming sequentially from a disk is faster than random reading from
>> memory [1].
>> Add the GC overhead, and you get an explanation for your results.
>>
>> [1] The Pathologies of Big Data,
>> http://queue.acm.org/detail.cfm?id=1563874
>>
>> Cheers,
>> --
>> Gianmarco
>>

Re: Review Request: Out-of-core messages

Reply via email to