[Freenet-dev] Updatable data

Oskar Sandberg Wed, 10 May 2000 16:25:26 +0200

On Wed, 10 May 2000, Ian Clarke wrote:

> > > Yes, I have just read his answer, but it is your opinion that he
> > > addressed my points convincingly, not mine!
> > But in your reply to my post you cut off everything except two lines. You 
> > take
> > up some more below, but you still haven't addressed my main defense for why 
> > it
> > works, namely that the propogation of the update is exactly like the 
> > propogation
> > of the newly inserted data.
> 
> A few things.  Firstly, that is kind of the pot calling the kettle black


Yes, but the kettle started it...

> - since your criticism of my proposal was sketchy at best (basically
> "this is broadcast, broadcast is bad" without significant further
> explanation. Now even LDC - not someone prone to agreeing with me -
> conceeds that a very restricted broadcast might be a good idea).  The
> difference between the popogation of this update, and the propogation of
> an insert is that when an insert is propogated, you can be sure that
> Freenet won't be full of nodes which are caching the data which you are
> inserting.  Our ideas are more alike than you suggest, I agree that we
> need a mechanism to bypass locally cached data, and some form of expiry
> would allow this, but your proposal addresses this issue, which is why I
> tried to incorporate it into my design.

While it is true that if you kill the broadcast where ever the data is not
found it will not grow out of proportion, that is not the only reason I dislike
anything that has broadcast in it. Basically, any form of "we will send it
everywhere" is fundamentally not working intelligently within the system and
taking advantage of it. It could be that it is impossible to make updates work
taking advantage of the current system (I think it is) but then we should
change the system so it will be (the "downwards references" would be one such
change, though probably not a good one), not try to force fit it.

I know that our proposals share a lot, but I have two objections to the way you
suggest things as it stands:

a) The "explosion" makes no use of the natural tendencies of Freenet
(references converge to a point where the data can be found, they don't
diverege to everywhere it is cached), and as such will have little or no effect.

b) Using "expiry" causes one's data to die if one is unable to update it. Not
using expiry makes updates infeasable because of a).

<snip - meta-discussions about the discussion are seldome helpful>

> > b) I think you misunderstand me. The Update gets send to 10-15 nodes 
> > because it
> > is send just like a normal InsertRequest with HTL (I guess) of around 
> > 10-15. If
> > no follow-through request for the data can reach it when it has gone this 
> > far,
> > then how do requests find newly inserted data, which also traveled 10-15 
> > nodes
> > using standard Request routing?
> 
> But this means that the insert only reaches one "epi-centre" node, and a
> line of nodes between you and it.  It will still be shielded by nodes
> caching the data.  You suggest a special request which "penetrates" this
> shield of cached data, but 

But you argued that even with a request that penetrates this it would not work.
It does work (if normal requests and inserts do). You are right that it
increases the number of messages that nodes will see, but I still don't believe
your version will work at all.

> > If the answer is yes, then the follow-through requests will find the updated
> > data, because they route with respect to the key exactly like requests for 
> > this
> > new data in my example would with respect the very close key.
> 
> But the problem is that in this case the node, or small number of nodes,
> which actually have the updated data will recieve all requests sent with
> this "follow-through" flag set.  Now from the users perspective they are
> *always* going to want the latest version of the data, thus they are
> always going to set the follow-through flag, and thus if the data is in
> any-way popular (such as a Freenet version of /.) these central servers
> will rapidly fall-over.  Freenet will, in effect, no longer live up to
> its promise of being slashdot-effect-proof.

Users will not always use follow-through because it is much slower, and users
are lazy. Freenet hops take a lot time, currently we estimate whopping 12
seconds per hop, so you do not want to have the request go to maximum depth
unless you are convinced you have to.

In fact, it will probably make sense from a time/effort perspective to start
with a non-follow-through meta-data request to see if that retrieves the latest
version.

That said, I do agree that it leeds to more messages for certain nodes. I think
that measures like the "no-update-before" date, or possibly the algorithm from
Squid that Theo described this morning would work to stop it. After all, if
your reasoning is correct then Squid would not lead to any load being taken off
the web servers and ISPs since it allows one to force the proxy to check if
the website has been updated. Obviously, this is not the case.

> If on the other hand, we try to make more nodes respond to the
> DataRequest for the updated data, using the "explosion" mechanism I
> propose, then a much larger number of nodes will be capable of
> responding to these DataRequests (hopefully a number proportional to the
> number of nodes actually caching the data), we avoid the /. effect.

But, given the existance of "no-update-before" date on the data, then this is
exactly what I want to do, except that the initial spreading to other nodes
would not happen at insert, but as people requested the new data. I find your
resitance to this a little weird since this is the brilliance of the Freenet
design, the data is brought to more nodes capable of serving it not because it
is sent to a lot places on insert, but because it is sent to more places every
time it is requested. The same effect should be used for updates.

Here is an alternative to the "no-update-before" idea. Instead of the
full follow-through in a message, you have a "return-newer-revision-than"
field. This would not return any data unless it found data of a later version
then that given in the field. An attempt to see if data had been updated would
simply mean to send this sort of of request with the last
known version in the "return-newer-version-than". 

This would ensure that locally or close proximity caches of the last version
are skipped, but it would also ensure that as soon as an update was found, that
would be returned, keeping the load off the "epi-center" nodes.

I'm not sure I really like it, because it would be hard to know if you really
had the latest version of just a later version, it would work sort of to
propogate new versions, avoiding completely your concern about too many request
reaching the same nodes.

> The argument that this "explosion" of messages will swamp the network is
> also incorrect - think about it.  What is the ideal result of an update
> (whatever the mechanism)?  It is that all of the nodes currently caching
> the data to be updated, will (after the update) be caching the updated
> data.  This means that at some point, sooner-or-later, they must recieve
> the update, whether through my explosion mechanism, or through your
> mechanism where the updates will be carried in DataReplies.  Either way,
> there is a lower-bound on the number of messages which must be sent for
> a complete data-update, and this lower-bound is directly proportional to
> the number of nodes caching the data.  If an "explosion" is done
> correctly, it should result in a number of messages being sent that is
> roughly proportional to the number of nodes caching the data.

Yes, I do understand this. But as I noted, swamping the network is not my only
concern about anything "broadcast". My biggest concern in this case I don't see
it working at all.

> > But it does answer the Request, it just performs a very light "make sure 
> > there
> > is no newer data within reach" operation on certain requests.
> 
> But this "make sure..." process will result in a /. effect on popular
> data as I point out above.

Sites get Slashdotted because they serve data and often run CGI scripts and
alike. If every request coming from being "Slashdotted" was the equivalent of a
ping, sites would not fall over.

But again, I do want to take measures to contain this.

> > I still don't think this sort of "constrained explosive" routing will work
> > downstream. Having cached the data is simply not quivalent to has link from
> > epi-center. Why should it be?
> 
> Can you clarify this - I don't understand what you mean here.

Very simplified:

I request some data, and my request goes through 5 nodes, all of whom cache
the data, and gain a reference to the node that had the data. Nodes 1-4 now
have the data and a reference to node 5. Node 5 does not have to have any
references to nodes 1-4 (chances are it will have, but nothing says it has to,
and if it does it is not the result of this request).

The author of the data then sends an update. It takes a different path and
reaches node 5 through some other nodes.

Now, nodes 1-4 will all know where to look for an update to the data - they
look in the same place that they would have looked for the data had they
forgotten it, node 5.

But what says that node 5 knows where to send an update of the data? Unless you
have "downwards references", you are relying on the somewhat vague notion that
the nodes would inherently be close from node 5's perspective since a request
that fit 5's bias was routed to them. But that is very vague if you consider
that if datastores have 500 references, the search is getting 500 times more
precise with each hop - nothing says that node 4, and especially not 3, 2, and
absolutely not 1 are really that close at all. 

> > I think you get stuck at an unholy compormise between not working and 
> > causing
> > to many messages.
> 
> If the explosion won't work, then your proposal definitely won't work
> since it results in much fewer nodes receiving the update initially.  As
> for too many messages, see my argument above.

The explosion will "work" if you have some other way of getting past cached
data. It just will not have any worthwhile effect, the update will not reach
caches much better then it does with a simple follow-through insert.

Your method for getting past cached data is to delete it when it expires. My
method is to check if there is a newer version after it has expired. As you
said the difference is not that big - but it is important.

> > But then Disident X gets caught an shot by Regime Y (it wasn't Freenet's 
> > fault,
> > his woman ratted him out!) While his loss is a sad thing, at least we want 
> > to
> > make sure that it does not mean that his famous page with info about Regime 
> > Y
> > disappears too!
> 
> Well, disident knew what he was doing when he set the expires - he
> doesn't have to do this, it is just useful because it will result in a
> faster update.  He could always have two versions of the data, one with
> an expires, which updates quickly, and the other without, which updates
> less efficiently.

Only that people would, because they are people, only use the version that
updates quickly, so the other would soon be forgotten and lost to the network.
And if people were using the seldome updated one, then it would be cached in
enough places to go from seldome updated to never updated.

> > I will take the appropriate step and make the same modification to my own
> > proposal, but one that does not involve the pitfalls of actually killing off
> > data on the network.
> > 
> > We use a deep/follow-through request system. However, data contains a 
> > storable
> > field that says how long the period it during which we are sure we will NOT 
> > get
> > another update. During this period, even follow-through requests will 
> > terminate
> > on finding the data.
> > 
> > It still isn't perfect since it means data updates will propogate badly if 
> > at
> > all during this period, but not being able to update for a while is a lot 
> > better
> > then the data dying if you are unable to update.
> 
> This is fine from the client end of things, but it doesn't address my
> concerns about the /. effect, the explosion propogation mechanism does. 
> I would be happy with this combined with some form of explosion for
> propogation which required a number of messages proportional to the
> number of nodes caching the data.

Yes it does, because it means that as soon as the first request moves the new
data away from the "epi-center", Requests that find that data do not have to go
all the way to that "epi-center". This is exactly the same as the reason why
nodes do not sink on a new insert! (which is heart of my idea - make updates
work like a normal insert to as large an extent as is possible).

> Ian.
> 
> _______________________________________________
> Freenet-dev mailing list
> Freenet-dev at lists.sourceforge.net
> http://lists.sourceforge.net/mailman/listinfo/freenet-dev
-- 

Oskar Sandberg

md98-osa at nada.kth.se

#!/bin/perl -sp0777i<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<j]dsj
$/=unpack('H*',$_);$_=`echo 16dio\U$k"SK$/SM$n\EsN0p[lN*1
lK[d2%Sa2/d0$^Ixp"|dc`;s/\W//g;$_=pack('H*',/((..)*)$/)

_______________________________________________
Freenet-dev mailing list
Freenet-dev at lists.sourceforge.net
http://lists.sourceforge.net/mailman/listinfo/freenet-dev

[Freenet-dev] Updatable data

Reply via email to