----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6020/ -----------------------------------------------------------
Review request for giraph. Description ------- Variable and Configuration option GiraphJob.MAX_VERTICES_PER_PARTITION was unclearly named. This config setting has nothing to do with regulating partition size, but in fact regulated the size of collections of vertices (temporarily stored in Partition objects) that were to be sent over RPC to their assigned host for the calculation super steps. By regulating these sizes based on # of vertices rather than # of edges per collection sent, the message sizes varied greatly, and in metrics based testing would overwhelm Netty buffers and cause failure of otherwise acceptable large jobs on my cluster, given my memory and worker constraints. This patch: 1. renames vars/config options to clearer titles that indicate they have to do with outgoing data burst size at the host who reads the data from InputSplit; the vertices in the data bursts are assembled into their true Partitions after traveling to the remote host where the master has assigned them according to partitionID. This greatly adds to the clarity of this very important section of code 2. Adds -Dgiraph.maxEdgesPerTransfer to regulate the data burst sizes by # of edges in a given temporary collection rather than by # of vertices, which dramatically increased throughput and kept healthy jobs from being overwhelmed at any point by a large RPC data burst during INPUT_SUPERSTEP. This repeatedly made a huge difference in the amount of data I could run in a successful job without failures, using the same memory and worker constraints I was previously running on "trunk". Diffs ----- Diff: https://reviews.apache.org/r/6020/diff/ Testing ------- July 14, 15, 16th on cluster with many times the data throughput I could achieve without it using the same memory/worker constraints. Passes 'mvn verify' etc. Thanks, Eli Reisman
