Re: Review Request: Out-of-core messages

Claudio Martella Fri, 24 Aug 2012 11:24:27 -0700

Hi Maja,

I didn't want to minimize your effort which is great, I just wanted to
underline the complete overlap between the previous plan and your
work. The designed had moved from BTrees+Bloomfilters towards sorted
partitions, as in the same discussion (17/Dec/11 15:28) and as
presented at the Berlin workshop on the 6th of june
(http://prezi.com/ecdgiav4oeex/bb-apache-giraph-distributed-graph-processing-in-the-cloud/).
It really looks to me like your contribution is one of the best ways
to tackle the problem.


Best,
Claudio

On Wed, Aug 1, 2012 at 8:43 PM, Maja Kabiljo <[email protected]> wrote:
> Another thing, I think I should explain what from GIRAPH-45 discussion am
> I actually using here, since I don't use bloomfilters and BTrees. The way
> it works is the following:
> - Inside the outer message store we have message stores for each of the
> partitions separately.
> - Partition message stores keep data in ordered map (ordered by vertex id).
> - In outer messages store we check if we should flush something (do we
> have more than allowed number of messages in memory). While we do, we
> flush the partition with largest number of messages in memory.
> - When partition messages store is flushed, all the data is written to a
> file in the order of vertex ids, file content is like:
> num_vertices
> vertex_1_id num_messages_1 message_1_1 message_1_2 ...
> vertex_2_id num_messages_2 message_2_1 message_2_2 ...
> ...
> - In the end each partition will have some messages in memory, and N
> files, where N is the number of times it was flushed.
> - When it's time to do the computation, within a single partition we call
> compute methods in order of vertex ids.
> - We use buffered streams and read data from all partition files
> sequentially, since we'll need data in the same order it's written in each
> of the files. This way we limit number of random file accesses.



-- 
   Claudio Martella
   [email protected]

Re: Review Request: Out-of-core messages

Reply via email to