[ 
https://issues.apache.org/jira/browse/GIRAPH-249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13416716#comment-13416716
 ] 

Eli Reisman commented on GIRAPH-249:
------------------------------------

The partitioning when we actually get them to the remote hosts is done this way 
in HashMasterPartitioner:

a potential set of X^2 partitionId's is created and assigned by master, so each 
worker out of X workers gets X potential partitions (some of which end up being 
empty from what I see in the logs, and depending on the size and # of data 
files. This is a potential fix as well)

these, once assigned, are queries when the worker is reading InputSplits and 
needs to store the map of temp partitions. Then, when that temp fills,it is 
given to Netty to send to whoever is the PartitionOwner the master set up to 
own that PartitionId. there it is broken down into a PartitionID and a 
Collection<BasicVertex> and sent to the worker's Netty connection.

The partitions are assigned by the master by simply iterating over the list of 
healthy workers at the very beginning of the job and assigning them in a single 
linear way like this:

WORKER 1  2  3  4  5
       -  -  -  -  -
PARTS  1  2  3  4  5
       6  7  8  9  10
       11 12 13 14 15

and so on...does this seem like it could be improved? Jakob thought it was ok 
for now, it does create an even spread across available workers for the 
hashing. But it leaves these lists of potential partitions to deal with on each 
worker when the work is done, some of which are empty in a given run. I know 
you're an algorithm guy, what do you think?

It might be good to try your code at .15 or even .12 ratio to spill to disk and 
see what happens. When I got it tuned right on the one job, it definitely 
helped. But yes those other patches just increased by data throughput by 
generous double digits over the weekend, and they are small (bug fixes really) 
and could easily be improved on later.

                
> Move part of the graph out-of-core when memory is low
> -----------------------------------------------------
>
>                 Key: GIRAPH-249
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-249
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Alessandro Presta
>            Assignee: Alessandro Presta
>         Attachments: GIRAPH-249.patch, GIRAPH-249.patch, GIRAPH-249.patch, 
> GIRAPH-249.patch, GIRAPH-249.patch
>
>
> There has been some talk about Giraph's scaling limitations due to keeping 
> the whole graph and messages in RAM.
> We need to investigate methods to fall back to disk when running out of 
> memory, while gracefully degrading performance.
> This issue is for graph storage. Messages should probably be a separate 
> issue, although the interplay between the two is crucial.
> We should also discuss what are our primary goals here: completing a job 
> (albeit slowly) instead of failing when the graph is too big, while still 
> encouraging memory optimizations and high-memory clusters; or restructuring 
> Giraph to be as efficient as possible in disk mode, making it almost a 
> standard way of operating.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to