Re: Review Request: Out-of-core messages

Maja Kabiljo Fri, 24 Aug 2012 12:37:19 -0700

Hi Claudio,

It's fine, I wasn't trying to take credits for the things you and others
came up with, I'm sorry if it sounded that way to you. Since there were a
lot of great ideas in the discussion, I just wanted to make clear which
things I am using and which not, so one wouldn't have to go through the
whole code to find that out.


Maja

On 8/24/12 7:23 PM, "Claudio Martella" <[email protected]> wrote:

>Hi Maja,
>
>I didn't want to minimize your effort which is great, I just wanted to
>underline the complete overlap between the previous plan and your
>work. The designed had moved from BTrees+Bloomfilters towards sorted
>partitions, as in the same discussion (17/Dec/11 15:28) and as
>presented at the Berlin workshop on the 6th of june
>(http://prezi.com/ecdgiav4oeex/bb-apache-giraph-distributed-graph-processi
>ng-in-the-cloud/).
>It really looks to me like your contribution is one of the best ways
>to tackle the problem.
>
>Best,
>Claudio
>
>On Wed, Aug 1, 2012 at 8:43 PM, Maja Kabiljo <[email protected]> wrote:
>> Another thing, I think I should explain what from GIRAPH-45 discussion
>>am
>> I actually using here, since I don't use bloomfilters and BTrees. The
>>way
>> it works is the following:
>> - Inside the outer message store we have message stores for each of the
>> partitions separately.
>> - Partition message stores keep data in ordered map (ordered by vertex
>>id).
>> - In outer messages store we check if we should flush something (do we
>> have more than allowed number of messages in memory). While we do, we
>> flush the partition with largest number of messages in memory.
>> - When partition messages store is flushed, all the data is written to a
>> file in the order of vertex ids, file content is like:
>> num_vertices
>> vertex_1_id num_messages_1 message_1_1 message_1_2 ...
>> vertex_2_id num_messages_2 message_2_1 message_2_2 ...
>> ...
>> - In the end each partition will have some messages in memory, and N
>> files, where N is the number of times it was flushed.
>> - When it's time to do the computation, within a single partition we
>>call
>> compute methods in order of vertex ids.
>> - We use buffered streams and read data from all partition files
>> sequentially, since we'll need data in the same order it's written in
>>each
>> of the files. This way we limit number of random file accesses.
>
>
>
>-- 
>   Claudio Martella
>   [email protected]

Re: Review Request: Out-of-core messages

Reply via email to