> On Aug. 15, 2012, 2:03 p.m., Alessandro Presta wrote: > > http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyClient.java, > > line 389 > > <https://reviews.apache.org/r/6600/diff/1/?file=139832#file139832line389> > > > > This looks a bit weird (using a TimedLogger for the timing but doing > > the actual logging on the raw Logger) although I see where the problem is > > (need to call it multiple times without waiting for the next deadline). > > Avery Ching wrote: > yeah, we can probably augment TimedLogger to have a level in a future > patch.
Yeah, also making it a full-fledged Logger (with its own isInfoEnabled() for example) would improve encapsulation. But this is very minor stuff. - Alessandro ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6600/#review10330 ----------------------------------------------------------- On Aug. 15, 2012, 6:52 p.m., Avery Ching wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/6600/ > ----------------------------------------------------------- > > (Updated Aug. 15, 2012, 6:52 p.m.) > > > Review request for giraph. > > > Description > ------- > > * Upgrade to the most recent stable version of Netty (3.5.3.Final) > * Try multiple connection attempts up to n failures > * Track requests throughout the system by keeping track of the request id and > then matching the request id to the response (minor refactoring of > WritableRequest to make requests simpler and support the request id) > * Improved handling of netty exceptions by dumping the exception stack to > help debug failures > * Fixes bug in HashWorkerPartitioner by making partitionList thread-safe > (this causes divide by zero exceptions in real life) > > > This addresses bug GIRAPH-300. > https://issues.apache.org/jira/browse/GIRAPH-300 > > > Diffs > ----- > > http://svn.apache.org/repos/asf/giraph/trunk/pom.xml 1372575 > > http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyClient.java > 1372575 > > http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyServer.java > 1372575 > > http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerClient.java > 1372575 > > http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/RequestInfo.java > PRE-CREATION > > http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/RequestServerHandler.java > 1372575 > > http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/ResponseClientHandler.java > 1372575 > > http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/SendPartitionMessagesRequest.java > 1372575 > > http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/SendPartitionMutationsRequest.java > 1372575 > > http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/SendVertexRequest.java > 1372575 > > http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/WritableRequest.java > 1372575 > > http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceMaster.java > 1372575 > > http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java > 1372575 > > http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashWorkerPartitioner.java > 1372575 > > http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/utils/TimedLogger.java > 1372575 > > http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/ConnectionTest.java > 1372575 > > Diff: https://reviews.apache.org/r/6600/diff/ > > > Testing > ------- > > Currently, netty connection failures causes issues with more than 75 workers > in my setup. This allows us to reach over 200+ in a reasonably reliable > network that doesn't kill connections. > > This code passes the local Hadoop regressions and the single node Hadoop > instance regressions. It also succeeded on large runs (200+ workers) on a > real Hadoop cluster. > > > Thanks, > > Avery Ching > >
