I just wanted to move a discussion we had on IRC to the list by putting out what I have now come to believe about the problem after putting it through the Oskar filter.
We have problems with the way 0.4 nodes behave when they become overloaded - they tend to become unreachable to new connections and slow down to a crawl, but without this having the effect of lowering the load. After thinking about it, I think this is because the only way that the node responds to high load being completely out of line with the 0.4 protocol. The current system is pretty much the same as the old one - when a node becomes overloaded (which almost almost manifests itself in an absolute way by the threadpool filling up) it stops accepting incoming connections. This worked decently in 0.3 and below, because then every new query would arrive on a new connection, and replies would almost always come over existing ones. This correlation between the tranport/session handeling (the connections) and the Application level protocol meant that the actual effect of the blocking of a transport layer connection was more or less equivalent to rejecting further requests. With 0.4 we removed this correlation between the layers, messages are now sent between nodes on whatever open connections are available or made, and queries are just likely to arrive on existing connections (incoming or outgoing) as new incoming ones. We didn't, however, implement any new overload handeling beyond the rejection of new incoming connections. It became clear quite soon that peoples 0.4 nodes were keeping all their connections open all the time. I took this to assume that algorithms for when idle connections were being closed needed to be tuned - that most of the connections locking up threads were simply redundant and rarely used connections. However, much stricter rules and the addition of connection pruning (which kills off all connections that are even slightly old when the node runs out threads) hasn't seemed to help at all. I am now coming to a different conclusion - the problem is not unused connections but used ones. The combination of datastore problems leading to very few stable nodes, and the widespread use of God aweful network flooding applications like "Frost", has resulted in the nodes from which I have seen logs and stats to very actually be overloaded - and because the correlation between new connections and new queries nolonger exists, our transport/session layer response to overloading has very little effect on the Application layer problem of query overloading. I now believe that the correct approach to this problem should be: - When the load on the node starts approaching the limit of what it can take, it should start rejecting queries at the Application layer by replying to (Data/Insert/Announcement)Request messages with QueryRejected. This should have the same effect that rejecting incoming connections had in 0.3. - We want to avoid rejecting incoming connections whenever possible, because in 0.4 this leads us fail to get replies as often it leads us to accept less queries. Thus, if we get close to running out of threads entirely, we should start dropping existing connections to make way for new ones. I think now that the whole pruning algorithm is unenecessary, this should simply be done by a one for one system. It seems logical to me to that the connection that we should drop should the least recently used one, but GJ believes this is bad through some arguments about attackers who cannot mount real attacks that make no sense to me, and if it matters to others then a random currently idle connection probably works as well. It should be noted that this second behavior will have absolutely no detrimental effect on the load - that is entirely moved into the Application layer where it belongs. Beyond this lies the use of real load balancing in the protocol by having nodes weigh the probability of reseting the DataSource against their relative load on the network - but I think we can leave that for another day. -- Oskar Sandberg oskar at freenetproject.org _______________________________________________ Devl mailing list Devl at freenetproject.org http://lists.freenetproject.org/mailman/listinfo/devl
