----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6600/ -----------------------------------------------------------
Review request for giraph.
Description
-------
* Upgrade to the most recent stable version of Netty (3.5.3.Final)
* Try multiple connection attempts up to n failures
* Track requests throughout the system by keeping track of the request id and
then matching the request id to the response (minor refactoring of
WritableRequest to make requests simpler and support the request id)
* Improved handling of netty exceptions by dumping the exception stack to help
debug failures
* Fixes bug in HashWorkerPartitioner by making partitionList thread-safe (this
causes divide by zero exceptions in real life)
This addresses bug GIRAPH-300.
https://issues.apache.org/jira/browse/GIRAPH-300
Diffs
-----
http://svn.apache.org/repos/asf/giraph/trunk/pom.xml 1372575
http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyClient.java
1372575
http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyServer.java
1372575
http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerClient.java
1372575
http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/RequestInfo.java
PRE-CREATION
http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/RequestServerHandler.java
1372575
http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/ResponseClientHandler.java
1372575
http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/SendPartitionMessagesRequest.java
1372575
http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/SendPartitionMutationsRequest.java
1372575
http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/SendVertexRequest.java
1372575
http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/WritableRequest.java
1372575
http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceMaster.java
1372575
http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java
1372575
http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashWorkerPartitioner.java
1372575
http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/utils/TimedLogger.java
1372575
http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/ConnectionTest.java
1372575
Diff: https://reviews.apache.org/r/6600/diff/
Testing
-------
Currently, netty connection failures causes issues with more than 75 workers in
my setup. This allows us to reach over 200+ in a reasonably reliable network
that doesn't kill connections.
This code passes the local Hadoop regressions and the single node Hadoop
instance regressions. It also succeeded on large runs (200+ workers) on a real
Hadoop cluster.
Thanks,
Avery Ching
