[freenet-dev] Why does announcement take so long?

vive Fri, 4 Dec 2009 00:01:59 +0100

On Wed, Dec 02, 2009 at 05:02:49PM +0000, Matthew Toseland wrote:
> On Wednesday 02 December 2009 14:13:14 vive wrote:
> > It sounds like matching newly arrived nodes to free opennet slots works 
> > bad. If I understand the source correctly, this also seems natural in the 
> > current setting. Since after some time of network operation, most nodes 
> > will have announced and gotten their desired number of opennet peers. There 
> > will still be a steady inflow of announces and requests, which makes 
> > arriving or existing nodes compete and grab the slots whenever they are 
> > open. Therefore, increasing /both/ the number of desired and available 
> > opennet connections is not going to fix the problem: they will quickly be 
> > consumed by dynamics in the existing network. We should consider how there 
> > could be more available slots than the established nodes themselves require.
> > 
> > I propose to discuss the following solution, based on separating normal 
> > ("persistent") opennet connections, and announcements:
> > Nodes keep announcing as before. But let every opennet node offer a small 
> > number of temporary slots, say 3, that it does /not/ use itself for path 
> > folding but it only offers to announcing opennet nodes. The announcing 
> > nodes would then get a better chance to establish themselves during a short 
> > time window. These temporary slots could be FIFO/LRU-dropped after 1 minute 
> > upon new announcements from other opennet nodes, or automatically after 10 
> > minutes to reduce the burden on nodes. When there are free persistent 
> > slots, nodes should offer these instead than temporary ones. But when there 
> > are no persistent slots to offer, tempoary ones are offered for a short 
> > while. Newly arrived nodes only use these up to, say, maximally 10 such 
> > temporary connections simultaneously - and drop them as they operate and 
> > get persistent connections.
> > 
> > I think this scheme has the following benefits:
> > 
> > 1. Give newly announcing nodes a chance to get onto the network (to get 
> > opennet connections that are persistent, rather than temporary ones, by 
> > routing as well as announcing).
> > 
> > 2. Prevent established nodes to consume the available slots using requests, 
> > since they do not desire more than their persistent ones.
> > 
> > 3. Allow newly added opennet nodes to receive requests and path fold with 
> > each other.
> > 
> > (Perhaps nodes also need to become more greedy about getting these 
> > temporary connections close to its identity key.)
> > 
> > Does this sound interesting? If interesting enough, I could try to simulate 
> > this.
> > 
> > Cheers,
> > /vive
> > 
> > 
> > On Fri, Nov 27, 2009 at 05:36:09PM +0000, Matthew Toseland wrote:
> > > The question boils down to why do the vast majority of nodes on an 
> > > announcement chain reject the announcement. The answer appears to be:
> > > 
> > > Most of the time, we reject a new node (from path folding or an 
> > > announcement) because we only drop a connected peer every 10 minutes, 
> > > even if it is isDroppable(). Of course, we drop disconnected 
> > > isDroppable() peers (= peers that disconnected more than a few minutes 
> > > ago) very quickly, but most peers are usually connected.
> > > 
> > > Generally there are peers we could drop according to isDroppable() (more 
> > > heuristics, e.g. don't drop a node if we added it less than 5 minutes 
> > > ago), but we don't drop them because of the 10 minute throttle.
> > > 
> > > Also, we only drop a connected peer after every 10 successful requests. 
> > > This appears not to be very significant on my node however.
> > > 
> > > Thoughts? evanbd suggested a token bucket for the intervals in 
> > > OpennetManager - we have two, this one and one for adding 
> > > old-opennet-peers. This would allow more burstiness.
> > > 
> > > Or we could scrap the 10 minute interval altogether (or cut it 
> > > drastically) and rely on the 10 successful requests criterion?
> > > 
> > > Anything we do to opennet may disrupt the network if we get it wrong, and 
> > > even if it's right it may take time to settle...
> > 
> There are several problems here:
> - Announcements are slow and inefficient: We send announcements to several 
> seednodes, these pass through hundreds of nodes, and the vast majority reject 
> them. Eventually we get some connections, and through those connections we 
> get more connections.


Well, that is what the proposal is meant to improve. By reserving special slots 
for announcing nodes, it gets quicker for them to start to operate - the 
natural process by which nodes get connected. 

> - Most announcements, in normal circumstances (i.e. not during a slashdot), 
> are from existing nodes that have been offline for a while. Reconnection to 
> existing peers can serve the same function, however on many nodes this won't 
> work due to NATs (or will be slow and only work after we have some 
> connections e.g. ARKs), and on those where it could work, it doesn't because 
> reconnections are treated the same as announcements and path folding - the 
> vast majority are rejected.

But is that surprising, since other nodes can consume them quickly? When 
leaving the network, 20 slots get free but also potentially consumed (from 
other requests) after a short while. 

> - So there are worries both about announcements being crowded out by path 
> folding, reconnections being crowded out by announcements and path folding, 
> and in general the algorithm for deciding whether to connect not taking into 
> account what sort of connection it is. This can be resolved by reserving 
> slots somehow. IMHO these slots should not be additional to the peers limit, 
> they should be included in it; and when nodes exit their grace periods they 
> should be treated as any other node i.e. dropped if we need the slot and they 
> are at the bottom of the LRU list. Hence, the proposal is to have limits on 
> the number of nodes that may be in their grace periods for each category - 
> path folding, announcements and reconnects. In a given category, if we reach 
> the limit, we summarily reject new connections of that type. We would need to 
> pick arbitrary limits for each category e.g. 2 announcements 2 reconnects 
> (max_peers - 4) path folding. The numbers must add up to no more than the 
> peers limit, but whether they should be less than it or equal to it I am not 
> sure of. This mechanism could slow down announcement during slashdottings, 
> and in fact at other times, but at least it would keep path folding working; 
> it would likely make reconnections much more successful. Overall I hope it 
> would be an improvement in performance even in terms of time to connect 
> newbies, but it does involve alchemy in the parameter setting.

I disagree with treating persistent and temporary announce-related peers the 
same, unless there are free persistent slots at the end of the grace period. 
They should in no way be eaten up by being treated as persistent after a while, 
unless there are free persistent slots available by then. If announce peers 
leave the grace period they should be kicked out quickly upon new announcements 
- otherwise we are back to the same problem. 

Importantly, path folding during normal network operation should not crowd out 
more essential things: like announcing or reconnecting. After all, path folding 
is important both for allowing nodes into the network - as well as slowly 
optimizing the topology for routing. The standard folding from requests only 
has a very marginal effect after a short while a of folding. And almost 
certainly, it will crowd out announces and reconnects in the current setting. 
This is a tradeoff between node performance and how it allows other nodes to 
temporary use its resources - I think it clearly would open more ways into the 
network.

> - The code is overly complex with regards to dropping disconnected, droppable 
> peers. We should count the number of peers which are either connected or are 
> not droppable, and if this is less than the peers limit, accept the incoming 
> peer unconditionally. If it isn't, we need to apply the above logic.

Agreed on complexity. Lots of nice microoptimisations :-)
With respect to counting: yes, if the slots for available persistent peers are 
not filled - a normal slot should be given. Otherwise, a node gets a temporary 
slot (based on some scheme) within a minute or two, which is has for max 10 
minutes (but potentially only a minute or two if others ask for it!) if no 
other node asks for it. The key here is to not treat it like the other peers 
with the 10-minute-before-drop limit. But in case a persistent slot becomes 
available during that time, a temporary peer can become persistent and free a 
temporary slot. Why not also allow a slot or two for reconnecting nodes, since 
the point is to let the nodes *operate* and read network traffic for available 
requests and announcements. After all, reconnects dont happen very frequently 
to individual nodes: that dynamics is much slower.

> - Currently the dominant reason for us to reject a connection is the fact 
> that we only drop a connected opennet peer (whether it is in its grace period 
> or not) every 10 minutes. This limit should probably be significantly 
> reduced, and should apply separately to each category e.g. a new announcement 
> connection can only kick out an existing connection every N minutes. We have 
> a similar limit based on the number of successful requests since we last 
> dropped a connected peer. We might want this to apply just to path folding, 
> or to apply for each category, maybe with different limits.

I dont think one should make several changes at once - better to implement and 
evaluate a new method defensively. But maybe the idea to limit on the number of 
requests is also good: when an announcement connection has seen a number of 
successful request - it should have used those to fold with the rest of the 
network, and can be immediately dropped.

> It looks like improving announcement is probably going to have to include 
> alchemy, but well, evanbd and vivee started that, and they're theoreticians, 
> they can tell us what to set the parameters to :)

I still only see limited way for defensive experimentation, alchemy or improved 
simulations here - rather than theory. The result can only be observed if the 
implementation is global. But maybe I am wrong :-)

> Thoughts?

I would just like to re-iterate one point: to have additional slots for 
announcing (and reconnecting) for a node X, is IMHO /not/ meant for connecting 
nodes to sit on some "waiting-list" to get upgraded to persistent connections 
to node X. Instead, it is meant to allow connecting nodes to start operate and 
observe traffic in the network as soon as possible! This is what gives nodes 
connections. The hypothesis is that by operating in the network as soon as 
possible, even though only for minutes - and protecting such slots from path 
folding, this allows several things. This includes nodes that are connecting in 
parallell to e.g. see each others announcements, start normal requests 
themselves, observe other requests and so on - but I wrote more about it in the 
original mail. The key is to not have the same kind of 10-minute limit to these 
extra slots - I think that is offering them too much.
What about trying simply with a period of 2 or 3 minutes before such 
connections can be eaten up by other announcements or reconnects - and shift it 
upwards if it doesnt work? This will quickly give an announcing nodes at least 
a number of peers from those that had connections for more than 2 or 3 minutes 
- a good chance for nodes unless the pressure is very strong on the network. 

> The conversation between me and evanbd (and a little with vivee) on this 
> topic is on this bug:
> https://bugs.freenetproject.org/view.php?id=3753

Two comments after having read this conversation

1. I think it is wrong to let nodes be moved from being temporary peers to 
normal persistent peers /unless/ there are suddently free slots in the 
persistent queue when the grace period is over (or if it happens during the 
grace period). The temporary slots should only be to allow nodes quickly start 
operating in the network - path folding should be made to where there are 
available persistent slots since that is the goal. The arguments are above (and 
in the first mail).

2. One point to make regarding to interpreting the nice data that evanbd has 
collected: even though a large fraction of nodes are still probably going to be 
online Y hours afterwards - it is a question of how relevant that is to 
reconnecting. And how many nodes we really want to offer the reconnection 
possibility. During the course of operating in the network nodes are likely to 
see many different peers over the hours. There are perhaps implementation 
reasons to limit the number of peers that can reconnect using some heuristic. 
Lets say that the nodes implement the previously discussed sharing of 
store/cache states. Then, keeping track of previous opennet peers and their 
previous states may become quite costy. Such things could either limit the 
fraction of nodes we really prefer Y hours later, or that it is simply costy 
implementation-wise. I have not thought much about it, but it does not seem 
clear to allow anyone we previously saw to reconnect..


Cheers,
/vive

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 195 bytes
Desc: not available
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20091204/30cd2a8e/attachment.pgp>

[freenet-dev] Why does announcement take so long?

Reply via email to