[ 
https://issues.apache.org/jira/browse/GIRAPH-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13437194#comment-13437194
 ] 

Eli Reisman commented on GIRAPH-301:
------------------------------------

With all the writes ZK does to create FINISHED and RESERVED nodes during the 
split claiming process, some of the readers are getting stuck behind lots of Zk 
sync's are moving very slowly down their split lists, if my logs are to be 
believed. many are not making it to the bottom by the time all the splits are 
read or (worse yet) by the time one or more workers read too many and 
overloaded, causing job failure. This is with 4 figures of workers so you may 
not see behavior like this with 50-100, I did not. But we are trying for lots 
of workers to spread the memory load out here. I have a different approach in 
mind that does not call ZK but instead tries to set the worker up to only have 
to check a few nodes before successfully claiming one, without losing locality. 
Will have a patch up soon, and will test this weekend. It will re-establish the 
hashing from 250 as this really did seem to spread the work out more evenly and 
without so many iterations on the list per worker. We'll see what happens...

                
> InputSplit Reservations are clumping, leaving many workers asleep while other 
> process too many splits and get overloaded.
> -------------------------------------------------------------------------------------------------------------------------
>
>                 Key: GIRAPH-301
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-301
>             Project: Giraph
>          Issue Type: Improvement
>          Components: bsp, graph, zookeeper
>    Affects Versions: 0.2.0
>            Reporter: Eli Reisman
>            Assignee: Eli Reisman
>              Labels: patch
>             Fix For: 0.2.0
>
>         Attachments: GIRAPH-301-1.patch, GIRAPH-301-2.patch
>
>
> With recent additions to the codebase, users here have noticed many workers 
> are able to load input splits extremely quickly, and this has altered the 
> behavior of Giraph during INPUT_SUPERSTEP when using the current algorithm 
> for split reservations. A few workers process multiple splits (often 
> overwhelming Netty and getting GC errors as they attempt to offload too much 
> data too quick) while many (often most) of the others just sleep through the 
> superstep, never successfully participating at all.
> Essentially, the current algo is:
> 1. scan input split list, skipping nodes that are marked "Finsihed"
> 2. grab the first unfinished node in the list (reserved or not) and check its 
> reserved status.
> 3. if not reserved, attempt to reserve & return it if successful.
> 4. if the first one you check is already taken, sleep for way too long and 
> only wake up if another worker finishes a split, then contend with that 
> worker for another split, while the majority of the split list might sit 
> idle, not actually checked or claimed by anyone yet.
> This does not work. By making a few simple changes (and acknowledging that ZK 
> reads are cheap, only writes are not) this patch is able to get every worker 
> involved, and keep them in the game, ensuring that the INPUT_SUPERSTEP passes 
> quickly and painlessly, and without overwhelming Netty by spreading the 
> memory load the split readers bear more evenly. If the giraph.splitmb and -w 
> options are set correctly, behavior is now exactly as one would expect it to 
> be.
> This also results in INPUT_SUPERSTEP passing more quickly, and survive the 
> INPUT_SUPERSTEP for a given data load on less Hadoop memory slots.
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to