Hi Ian, hi Oscar,
I'll try to defend Oscar's proposal for updating, because I think that
it's a pretty brilliant idea and your arguments, Ian, for your idea
just don't convince me. (sorry) I'll try to outline why I think it
will certainly not work.
Ian Clarke wrote:
>
> A few things. Firstly, that is kind of the pot calling the kettle black
> - since your criticism of my proposal was sketchy at best (basically
> "this is broadcast, broadcast is bad" without significant further
> explanation.
But it is actually that simple. Your approach - let's call it fireworks-
approach - will effectively spawn multiple messages (with low HTL) for
every node that had a copy of the data - which means that the number
of messages sent for this operation will grow exponentially with the
popularity of the data, IMHO a BAD THING(tm).
What's worse, all the nodes that do not any longer have the data
(or a reference) will just sent the new message (with small HTL)
they get on towards their "best guess" - which, if our routing works,
will be the so-called epi-centre. This makes your idea even worse,
because as I see it this completely fails the purpose. Nodes close
to the epi-centre will get multiple instances of the update-message
(maybe with payload, maybe without), whereas most nodes that already
have a copy of the message won't profit from this operation at all.
Let me illustrate:
Node A requested a document from Node B which requested it on A's
behalf from Node C (which is very close to the epi-centre). Now
let'S assume that the data died in Node B and also that node C does
not know about node A (a likely case). Now the publisher sends her
updated document (directly or inderectly) towards the epi-centre.
As node C receives the update, it will do your explosion thingy.
On of the small-HTLed updates is sent towards node B.
Node B does not know about the data any more, nor does it have a
reference for it. All it can do now is send this small update
on _towards_ the epi-centre, where it actually should go away from
it. He will not be able to say: Hey, this data was once requested
from me by node A - I should send the update on towards it.
So you see, you will fail to notify node A of the update, but
instead generate dozens of update-messages headed towards the
epi-centre.
> Now even LDC - not someone prone to agreeing with me -
> conceeds that a very restricted broadcast might be a good idea). The
> difference between the popogation of this update, and the propogation of
> an insert is that when an insert is propogated, you can be sure that
> Freenet won't be full of nodes which are caching the data which you are
> inserting. Our ideas are more alike than you suggest, I agree that we
> need a mechanism to bypass locally cached data, and some form of expiry
> would allow this, but your proposal addresses this issue, which is why I
> tried to incorporate it into my design.
But if you do not start to create two-way references (which would have
manymany more complications) there will be no way to find all copies of
the
document at insert time. That means you have to do at least some of the
work at request time.
> > If the answer is yes, then the follow-through requests will find the updated
> > data, because they route with respect to the key exactly like requests for
> > this
> > new data in my example would with respect the very close key.
>
> But the problem is that in this case the node, or small number of nodes,
> which actually have the updated data will recieve all requests sent with
> this "follow-through" flag set. Now from the users perspective they are
> *always* going to want the latest version of the data, thus they are
> always going to set the follow-through flag, and thus if the data is in
> any-way popular (such as a Freenet version of /.) these central servers
> will rapidly fall-over. Freenet will, in effect, no longer live up to
> its promise of being slashdot-effect-proof.
The /. effect can be combated quite easily here. The idea for that has
been used by DNS for a long time.(yeah i know you all don't like DNS :)
The idea is just to have a minimum time between cache refreshes
(checks for potential updates).
This can either be set on a node-by-node basis or specifically for
each file.
Let's assume we have three nodes A,B,C which all know an intermediate
node I, which in turn knows the "epi-centre"-node E.
A --\
B - I -- E
C --/
Now let's say the minimum time between cache refreshes is 5 min.
I assume here the HTL would always run out on node E - just for the
clearness of the example, every other case would work simmilar.
Now the following could happen:
0:00 A follow-through request for data Q reaches node A. It follows
through to node I and finally to node E, where it is answered.
Now A and I know for sure what the newest versuion of Q is.
0:04 A follow-through request for data Q reaches node B. It is sent
on to node I, which services to the best of it's knowledge -
that is, it tells B the newest version of Q as of it's request
at 0:00. Now comes the tricky part: Node B learns about the
newest version of Q, and will not rerequest this until 0:09 -
five minutes later. This means, if the document Q was updated
at 0:01, node B will still give out the old version of Q until
0:09. This means that the update will only spread gradually
over the whole network. If we talk of a typical HTL of 10, this
means up to 50 minutes until _every node interested_ will have
the new version. If you think that's too high, reduce the
time between cache refreshes.
0:06 A follow-through request for data Q reaches node C. It is sent
to node I, and since I last checked the version of Q more than
five minutes ago, it will ask E again for the newest version
of Q. If any newer version exists, node C will know immediately
and all nodes asking C or I in the next five minutes will be
told about it too.
This effectively combats the slashdot-effect. Node E will at a maximum
get one request from every node that knows about it per five minutes.
If node E should happen to be the one central (ieek) epi-centre for
the highly popular document Q, and let's say 15 nodes know about E,
then E will get at maximum
15 requests / 300 seconds = 0.05 requests/second
or three requests per minute. Pretty acceptable for a node which
puts itself at the center of all communication. :)
And if the HTL of the request-throughs does not run out on node
E, it will send on at the maximum one request per five minutes,
like every other node would. Not much load for the node beyond E.
> If on the other hand, we try to make more nodes respond to the
> DataRequest for the updated data, using the "explosion" mechanism I
> propose, then a much larger number of nodes will be capable of
> responding to these DataRequests (hopefully a number proportional to the
> number of nodes actually caching the data), we avoid the /. effect.
>
> The argument that this "explosion" of messages will swamp the network is
> also incorrect - think about it. What is the ideal result of an update
> (whatever the mechanism)? It is that all of the nodes currently caching
> the data to be updated, will (after the update) be caching the updated
> data. This means that at some point, sooner-or-later, they must recieve
> the update, whether through my explosion mechanism, or through your
Yes, but you talk to many nodes who aren't interested in your data at
all
and many of them will see the request multiple times. That's what
effectively puts your scheme off - it is a very "broad" operation, every
way you put it.
> > I still don't think this sort of "constrained explosive" routing will work
> > downstream. Having cached the data is simply not quivalent to has link from
> > epi-center. Why should it be?
>
> Can you clarify this - I don't understand what you mean here.
This argument is what I tried to say above. You may well have got a
document
from a node, but that does not mean the node knows about you, and it
sure
does not mean the node knows you still have that document.
> > We use a deep/follow-through request system. However, data contains a
> > storable
> > field that says how long the period it during which we are sure we will NOT
> > get
> > another update. During this period, even follow-through requests will
> > terminate
> > on finding the data.
This can be added too. But the more important mechanism is the minimum
time between cache refreshes, as outlined above.
Well, so far for my 0.02 Euro. Hope this helps somewhat.
I hope to hear some feedback, especially by you Oscar and Ian.
Bye,
Philipp
_______________________________________________
Freenet-dev mailing list
Freenet-dev at lists.sourceforge.net
http://lists.sourceforge.net/mailman/listinfo/freenet-dev