On Mon, May 4, 2009 at 6:15 PM, Matthew Toseland
<t...@amphibian.dyndns.org> wrote:
> On Monday 04 May 2009 17:29:51 Evan Daniel wrote:
>> On Mon, May 4, 2009 at 11:33 AM, Matthew Toseland
>> <t...@amphibian.dyndns.org> wrote:
>> > 1. Release the 20 nodes barrier (206 votes)
>> >
>> > As I have mentioned IMHO this is a straightforward plea for more
> performance.
>>
>> I'll reiterate a point I've made before.
>>
>> While this represents a simple plea for performance, I don't think
>> it's an irrational one -- that is, I think the overall network
>> performance is hampered by having all nodes have the same number of
>> connections.
>>
>> Because all connections use similar amounts of bandwidth, the network
>> speed is limited by the slower nodes.  This is true regardless of the
>> absolute number of connections; raising the maximum for fast nodes
>> should have a very similar effect to lowering it for slow nodes.  What
>> matters is that slow nodes have fewer connections than fast nodes.
>>
>> For example, the max allowed connections (and default setting) could
>> be 1 connection per 2KiB/s output bandwidth, but never more than 20 or
>> less than 15.
>
> What would the point be? Don't we need a significant range for it to make much
> difference?

If the network is in fact limited by the per-connection speed of the
slower nodes, and they are in fact a minority of the network,
increasing the per-connection bandwidth of the slower nodes by 33%
should result in a throughput increase for most of the rest of the
network of a similar magnitude.  A performance improvement of 10-30%
should be easily measurable, and (at the high end of that) noticeable
enough to be appreciated by most users.

Really, though, the idea would be to use it as a network-wide test.
Small tests by a few users are helpful, but not nearly as informative
as a network-wide test.  Assuming the change produced measurable
improvement, it would make sense to explore further changes.  For
example, changing the range to 15-30, or increasing the per-connection
bandwidth requirement, or making the per-connection requirement
nonlinear, or some other option.  However, security concerns
(especially ubernodes) are bigger with more dramatic changes.

>
>> Those numbers are based on some (very limited) testing
>> I've done -- if I reduce the allowed bw, that is the approximate
>> number of connections required to make full use of it.
>>
>> Reducing the number of connections for slow nodes has some additional
>> benefits.  First, my limited testing shows a slight increase in
>> payload % at low bw limits as a result of reducing the connection
>> count (there is some per-connection network overhead).
>
> True.

To be specific, my anecdotal evidence is that it improves the payload
fraction by roughly 3-8%.

>
>> Second, bloom
>> filter sharing represents a per-connection overhead (mostly in the
>> initial transfer -- updates are low bw, as discussed).  If (when?)
>> implemented, it will represent a smaller total overhead with fewer
>> connections than with more.  Presumably, the greatest impact is on
>> slower nodes.
>
> Really it's determined by churn, isn't it? Or by any heuristic artificial
> limits we impose...

My assumption is that connection duration is well modeled by a
per-connection half-life, that is largely independent of the number of
connections.  The bandwidth used on such filters is proportional to
the total churn, so fewer connections means less churn in absolute
sense but the same connection half-life.  (That is, bloom filter
bandwidth usage is proportional to # of connections * per-connection
churn rate.)  I don't have any evidence for that assumption, though.

>>
>> On the other hand, too few connections may make various attacks
>> easier.  I have no idea how strong an effect this is.  However, a node
>> that has too many connections (ie insufficient bw to use them all
>> fully) may show burstier behavior and thus be more susceptible to
>> traffic analysis.
>
> Yes, definitely true with our current padding algorithms.
>
>> In addition, fewer connections means a larger
>> network diameter on average, which may have an impact on routing.
>> Lower degree also means that the node has fewer neighbor bloom filters
>> to check, which means that a request is compared against fewer stores
>> during its traversal of the network.
>
> True.

Do you know how big a problem this would cause?  My assumption is that
it would be a fairly small effect even on the nodes with fewer
connections, and that they would be in the minority.

>>
>> I'm intentionally suggesting a small change -- it's less likely to
>> cause major problems.  By keeping the ratio between slow nodes (15
>> connections) and fast nodes (20 connections) modest, the potential for
>> reliance on ubernodes is kept minimal.  (Similarly, if you want to
>> raise the 20 connections limit instead of lower it, I think it should
>> only be increased slightly.)
>
> Why? I don't see the point unless the upper bound is significantly higher than
> the lower bound: any improvement won't be measurable.

As above, I would hope that the improvement *would* be measurable,
even though it wouldn't be huge.

>>
>> And finally: I have done some testing on this proposed change.  At
>> first glance, it looks like it doesn't hurt and may help.  However, I
>> have not done enough testing to be able to say anything with
>> confidence.  I'm not suggesting to implement this change immediately;
>> rather, I'm saying that *any* change like this should see some
>> real-world testing before implementation, and that reducing the
>> defaults for slow nodes is as worthy of consideration and testing as
>> raising it for fast nodes.
>
> We did try this (with a minimum of 10 connections), and it seemed that slow
> nodes with only 10 connections were significantly slower. However, this was
> not based on widespread testing. My worry is that slow nodes with few
> connections will be *too* slow, and the network will marginalise them. But
> it's a tradeoff between slightly more efficiency, fewer routes to choose
> from, and fewer nodes sending requests...

Would they be any more marginalized than they already are?  If they
have fewer connections, then their routes aren't as good, but they
should reject incoming requests from the connections they do have less
often, right?

>>
>> Also: do we have any idea what the distribution of available node
>> bandwidth looks like?
>
> It would be great, wouldn't it? Maybe a survey? What questions should we ask?

Hmm.  Depends a bit on how general a survey you want to make it, I
suppose.  Would this be done as a new survey toadlet, or by some other
means?  I get the impression not many people answer email surveys or
feedback solicitations.

Here's a few thoughts:

-- Configured output limit
-- user-reported nominal connection speed
-- whether the node is limiting on the configured limit or wire speed
(or, more generally, the rejection reasons counts)

-- number of opennet connections
-- number of darknet connections
-- number of backed off peers
-- recent uptime %
-- datastore size
-- anything else of interest from the stats page

-- feature priorities
-- general user comments

Evan Daniel
_______________________________________________
Devl mailing list
Devl@freenetproject.org
http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl

Reply via email to