I have been looking at the way routing is done in the 0.3.9.x codebase, 
specifically, Freenet.message.Request.sentToNextBest.

The current implementation isn't very robust in the face of transient 
transport problems.   The following scenario seems to happen quite often.

a) A node performs well for a while so it builds up many references in the 
data store.
b) For some reason the node can't be reached.  Probably because its outbound
thread limit has been exceeded.
c) The current implementation Freenet.message.Request.sentToNextBest gets 
ConnectFailedExceptions and tries ever more distant (in keyspace) refs, 
WITHOUT PAYING ATTENTION TO THE ADDRESS OF THE NODEREFS. 
refs which die with connection exceptions are removed from the DataStore.
The problem is that often, every single ref that refers to the same physical 
address will be removed quite quickly, permanently removing the corresponding 
node from future routing decisions.

Transient transport problems won't go away.  In fact they are an indication 
that a node is performing well.  A node is good at answering requests, so it 
gets more requests than it can answer (hits outbound thread limit) so it 
drops some. The dropped requests appear as transport failures on
the initiating end.  The real problem is that these transient failures have
a disproportionate negative impact on routing.
 
I am sure this issue has probably been addressed for 0.4 in the routing 
overhaul.  I haven't looked at that code.

In the mean time it would be good to fix this problem in the 0.3.9.x codebase.

Here's what I propose.  Instead of dropping refs to nodes that can't be
reached immediately, put their physical addresses in a "probation list".  
Modify Freenet.message.Request.sentToNextBest to ignore all refs with 
addresses in the "probation list" for a fixed timeout period, and only delete
them after they have failed  multiple times.

See attached diffs for a sketch of my proposed implementation.
(This works but isn't ready to commit yet). 

Thoughts?

--gj


-- 
Web page inside Freenet:
freenet:MSK@SSK@enI8YFo3gj8UVh-Au0HpKMftf6QQAgE/homepage//
Index: message/Request.java
===================================================================
RCS file: /cvsroot/freenet/Freenet/message/Request.java,v
retrieving revision 1.32
diff -r1.32 Request.java
91a92,152
>     ////////////////////////////////////////////////////////////
>     // Address which is predestined for damnation
>     static class ReprobateAddress {
> 	public ReprobateAddress(long timeOutMs) {
> 	    this.timeOutMs = timeOutMs;
> 	    this.failures = 0;
> 	    fail();
> 	}
> 
> 	public void fail() {
> 	    ignoreUntilMs = System.currentTimeMillis() + timeOutMs;
> 	    failures++;
> 	}
> 
> 	public int getFailures() { return failures; }
> 	public boolean isShunned() { return System.currentTimeMillis() < ignoreUntilMs; }
> 	public boolean isStale(long earliestTimeMs) { return ignoreUntilMs < earliestTimeMs; }
> 
> 	private long timeOutMs;
> 	private long ignoreUntilMs;
> 	private int failures;
>     }
> 
>     private static java.util.Hashtable reprobateAddresses = new java.util.Hashtable();
> 
>     private static synchronized void addReprobateAddress(Address addr) {
> 	reprobateAddresses.put(addr, new ReprobateAddress(RETRYWAIT_MS));
>     }
> 
>     private static synchronized ReprobateAddress getReprobateAddress(Address addr) {
> 	if (addr == null) {
> 	    return null;
> 	}
> 
> 	return (ReprobateAddress)reprobateAddresses.get(addr);
>     }
> 
>     private static synchronized void forgiveReprobateAddress(Address addr) {
> 	if (addr == null) {
> 	    return;
> 	}
> 	reprobateAddresses.remove(addr);
>     }
> 
>     public static synchronized void flushReprobateAddresses() {
> 	long oldestMs = System.currentTimeMillis() - FLUSHINTERVAL_MS;
> 
> 	for (java.util.Enumeration e  = reprobateAddresses.elements(); e.hasMoreElements();) {
> 	    ReprobateAddress r = (ReprobateAddress)e.nextElement();
> 	    if (r.isStale(oldestMs)) {
> 		System.err.println("REMOVED REPROBATE ADDRESS");
> 		reprobateAddresses.remove(r);
> 	    }
> 	}
>     }
> 
>     private final static int MAX_TRANSPORTFAILURES = 20;
>     private final static long RETRYWAIT_MS = 60*5 * 1000;
>     private final static long FLUSHINTERVAL_MS = 24*60*60*((long)1000);
> 
>     ////////////////////////////////////////////////////////////
97c158,159
< 	
---
> 	ReprobateAddress ra = null;
> 
104a167,174
> 		    ra = getReprobateAddress(addr);
> 		    if (ra != null) {
> 			if (ra.isShunned()) {
> 			    System.err.println("SKIPPING SHUNNED ADDRESS: " + addr.toString());
> 			    addr = null;
> 			}
> 		    }
> 
129a200,201
> 
> 	    ConnectionHandler ch = null;
131c203
< 		ConnectionHandler ch = n.makeConnection(addr);
---
> 		ch = n.makeConnection(addr);
133a206,208
> 		// If we got this far, there wasn't a problem
> 		// with the transport.
> 		forgiveReprobateAddress(addr);
135c210,226
< 		brokenLinks.addElement(kmm.lastAttempt);
---
> 		System.err.println("TRANSPORT FAILURE: " + addr.toString());
> 		if (ra == null) {
> 		    addReprobateAddress(addr);
> 		}
> 		else {
> 		    ra.fail();
> 		    if (ra.getFailures() > MAX_TRANSPORTFAILURES) {
> 			System.err.println("TOO MANY TRANSPORT FAILURES: " + addr.toString());
> 			brokenLinks.addElement(kmm.lastAttempt);
> 		    }
> 		}
> 
> 		// Release the connection
> 		if (ch != null) {
> 		    ch.forceClose();
> 		}
> 		
138a230,233
> 		// Release the connection
> 		if (ch != null) {
> 		    ch.forceClose();
> 		}
146a242
> 	    System.err.println("REMOVING LINK FOR REF: " + badRef);
150a247
> 
Index: node/DataStoreMaintence.java
===================================================================
RCS file: /cvsroot/freenet/Freenet/node/DataStoreMaintence.java,v
retrieving revision 1.6
diff -r1.6 DataStoreMaintence.java
57a58
> 		Freenet.message.Request.flushReprobateAddresses();

Reply via email to