[freenet-dev] Re: What to fix next?

Mike Stump Fri, 21 Nov 2003 15:12:07 -0800

> Instead, we need to get routing working well.  I claim big routing
> tables will help that.  At least, we should try it.  Can anyone tell
> me why big routing tables won't help routing?


I think we need to step back and think about the problem some.  I
think we don't have a good idea why we get so many DNFs and I suspect
the design isn't better than random routing.  I suspect we don't have
numbers that prove how much better than random routing we are doing
and how close we are to a good theoretic estimate of how good it
should be.

We can increase the size, but that I don't think it will change
anything.  Someone I think is free to run a node with a large table
and report their experience back.  Love to see the numbers.

I'd like to see evidence that specialization works.  I'd love to see
all nodes specializing.  I'd love to see routing working better than
random routing.  I'd love to see people tackle the latency problem.

For example, maybe flooding the network with routing answers during
insert is better than what we have now.  This way, either the routing
answer is found on disk, or not and we can immediately DNF it without
any request traffic to other nodes.  I've not seen the proof that says
this works less well than the measured actual performance of freenet
today.  I know that would eliminate 224 messages out of 225 messages
based upon traffic to the heaviest node in my routing table.  The
question is would it add more than 224 messages?  That I don't know.
I know that for latencies, it would drop them to 20ms for a DNF (about
230x better) and for search successes it would drop them to 20ms
(about 6224x better).  These numbers taken from the two heaviest
entries in my routing table based on messages.  A freenet that can
start getting data 100x to 1000x faster I think would be better than
what we have today.

Some might say this doesn't scale, well, that's true, but that isn't
relevant, what is relevant is does is scale better than what we have
today?  Also, I can think of trivial ways to enhance it to make it
scale better.  For example, what if had a table of 50000 entries and
we insert routing answers into it and when it filled, we split it in
half, and half the nodes went to serve one side and other other half
to our side.  If a request comes in for our half, presto instant
answer and if for the other side, we just forward the request on to
one of the nodes serving the other half.

And for n splittings, we refine it just a touch.  Each table is either
a table of 25k to 50k individual keys, or a table of 25k to 50k key
ranges or the top table of 2-50k key ranges with bounds from 0x0 to
0xFFF...  As we split, a few of the nodes form a parent node with a
table with 2 elements, each element has references to all the nodes
that serve the key range indicated by the element.  Inbound requests
go out to the closest match in our routing cache.  If all of freenet
had only 50k keys, then the request goes out to the node that has (or
had) the data.  If freenet has 50000^2 keys or less, then 1/50,000 of
the keys will take 0 hops to solution, and the rest will take at most
2 hops to solution.  Likewise, for 50000^5 keys or less, it would take
5 hops at most.  The total key space would take 10.25 hops at most.
Freenet currently on average takes 14-15 hops on those that actually
find data in my store.  The current size of freenet is probably around
2 hops in my scheme.  This is around 7x better.  My scheme can be
proven to work, the current freenet scheme, well, I don't recall
seeing that proof.

Now, to add security, all nodes vote in such a way as to determine how
the splitting is done, and who gets to be a parent.  For example, for
n nodes, takes the XOR of all the DSA values from all n nodes.  Call
this the voting key.  XOR your node DSA with this voting key, call
this ID.  Sort the n ID values from all n nodes and then the first
half form one group that takes the first half of the key space and the
second half takes what remains.  Want to try and grab a certain part
of the key space, nearly impossible.

Also, with a bottom up growth the routing tree should be fairly
balanced, and we can re-balance it by informing our parent how many
nodes are below us once a day.  They can then suck the nodes up and
put them into a sibling node if our node gets too small.  Also, the
splitting of a node can be done not so there are exactly 25k on each
side, but rather so that the sum of the keys under each half is as
near to being the same as possible.

Ah, what about load balancing, beefy DSAs can sign up to serve many
tables.  For example, when a node splits all nodes that can handle 2x
the traffic can serve both sides.  Maybe we dynamically grow the table
based upon the max loading a large enough set can handle.  Don't split
until there would be too few nodes that would remain to handle the
data.  Maybe the large stable nodes we migrate up the food chain to
serve as parents, and as those nodes dominate the upper levels they
dynamically resize to be larger because they are beefy.

The rest of the nodes would just serve as data pumps, and any
bandwidth not consumed by routing on the nodes that do routing would
be used to pump data.  Routing should be the first priority.

Now, why do this, because it has a provable upper bound for a total
cost of all upstreams used on all nodes to be around 800 bytes for a 4
hop search, and a provable ability to find the data.  Also, at 1 mbps
on a beefy node, that would translate to 500 requests a second served
out of ram.  For a client node of 2000 requests a day (my second
highest daily peak in 30 days), that would be 21,600 clients per beefy
node.  A gige connection, 21 million clients.  Also, inserts would be
time taken to insert at an HTL that is reflective of how much
redundancy you want, and how much you want to try and shield your
identity.  A default HTL of 2-5 might be reasonable.  For daily
inserts that don't have to be very redundant that don't have any
anonymity concerns, insert at HTL 0 and put in routing engine and
around 200 ms later, it is live to all of freenet with probability 1.

Anonymity can be built on top of the routing engine on the client
side.  Insert for example at HTL 3-7 and then delete out of your store
and request the others randomly delay registering with the routing
engine for 10-20 seconds.  Routing engine doesn't know who had the
data first.  Other nodes don't really know who had it first, but they
can point to some one that had it before them (same as freenet
currently).  Ask random other node for key instead of the routing
engine some of the time...
_______________________________________________
Devl mailing list
[EMAIL PROTECTED]
http://dodo.freenetproject.org/cgi-bin/mailman/listinfo/devl

[freenet-dev] Re: What to fix next?

Reply via email to