On Thu, Jul 24, 2003 at 03:14:39PM +0100, Gordan wrote:
> On Thursday 24 July 2003 14:43, Kevin Steen wrote:
> > At 24/07/2003 14:01, you wrote:
> >  >On Thursday 24 July 2003 12:36, Michael Schierl wrote:
> >  >> Toad schrieb:
> >  >> > Changes (a ton, mostly not mine):
> >  >> > * Implemented support for ZIP containers
> >  >
> >  >...
> >  >
> >  >Call me skeptical, but I think this is an amazingly bad idea. It removes
> >  > any concept of having redundant date de-duplicated automatically. Also,
> >  > downloading 1 MB file will potentially take quite a while. Smaller files
> >  > can be downloaded with a greater degree of parallelism. I am simply not
> >  > convinced
> >  >that partial availability is a problem with a properly routed node, and
> >  > that is all this will achieve. In a way, I think this will make the
> >  > problem worse,
> >  >because if the entire file cannot be retrieved or re-assembled, then the
> >  >whole site is unavailable, rather than perhaps a few small parts of it.
> >
> > Supporting containers allows freesite authors to make the decision for
> > themselves, with the 1MB limit preventing drastic duplication on the
> > network.
> 
> I am not convinded that this decision should be left to the author. If they 
> are that concerned about it they can upload the ZIP of the whole site 
> separately themselves and give link to it. At best building it into the node 
> is a bodge, and at worst, it is counterproductive. What happens when the same 
> files are linked from multiple pages, e.g. active links? Do you bundle the 
> files separately into each "archive set"? Where do you draw a line?

We have been through this about 5000 times; the conclusion last time was
that it is probably worthwhile. I admit that maybe we need to reduce the
container size limit below 1MB.
> 
> > I see the main use for containers being to keep the _required_
> > parts of a freesite together - namely the html files, layout images, PGP
> > key and Activelink.
> 
> Except that for a site with more than 2 pages, this becomes extremely 
> cumbersome to separate manually. An automated approach could be used by 
> analysing html and linked documents, but this has other limitations, such as 
> how do you decide how to separate the files into archives? What about when 
> you have to put one file into all archives? How difficult will it be to come 
> up with "auxiliary" archives that have files accessed from multiple pages?

No, it would be very easy to for example bundle all the static images on
a site into a single container - or all today's index pages on an index
site.
> 
> It is logically incoherent, and cannot be dealt with in a way that is both 
> generic and consistent. Therefore, I believe it should not be catered for, 
> especially as it doesn't add any new functionality, and the benefits it 
> provides are at best questionable.
> 
> > For me, having all of those available goes a long way
> > to differentiating "good" freesites from "bad" ones. Also, there should be
> > some saving on bandwidth and processing by not having to deal with so many
> > small files.
> 
> I disagree. Dealing with multiple small files can be dealt with in parallel - 
> less so than for bigger files. In fact, there are answers in the FAQ about 
> how to make IE and Mozilla use more simultaneous connections. If an archive 
> file goes missing, that's it, no site at all. I do not believe that that 
> would be an improvement.

They _can_ be done in parallel, but since there is no redundancy, the
user ends up waiting for the slowest one. And they cause additional
request load on the network which is not really necessary... certainly
it can be misused.

> 
> Say you use FEC, how many parts will a 1 MB file be split into? 4? How is that 
> going to be faster than downloading 20 much smaller files in parallel? It 
> strikes me that archive based sites are simply not the correct tool for the 
> job in pretty much all cases.

Files less than or equal to 1MB are not split. Containers are not split,
for various reasons. However you could probably get a performance gain
from splitting containers, because of FEC - a 1MB file becomes 6
chunks, of which four are needed, so we try to fetch all six and
complete when we get four of them, just like the current splitfile
downloading code. But for the foreseeable future, automatically followed
splitfiles are a bad idea for various reasons, so we avoid them.
> 
> Freenet is high-latency potentially-high-bandwidth network, but in this case 
> it doesn't matter - because the latency for parallel dowload is fixed to that 
> of the slowest download. As they are all happening in parallel, the latency 
> penalty is effectively taken once, just as it would be for the archive 
> split-file.

A single file gets pretty bad bandwidth as well as pretty bad latency,
although these should both improve with NGRouting.
> 
> Additionally, more smaller downloads will probabalistically come from more 
> different hosts, thus maximising the use of the bandwidth on the requesting 
> node - in effect making it faster.

It's about atomicity. And it's a tool that can certainly be abused. And
maximizing the use of bandwidth is not necessarily all that great if the
page ends up waiting for the excessively slow slowest element on the
page. The 1MB static images container will be popular, especially if it
does not change often between days on a DBR site. Anyway, 1MB is an
extreme. I expect most containers to be more on the order of 50kB.
> 
> >  >Additionally, it means that even if you want to look at one or two pages
> >  > of a 100 page site, you still have to download the entire site.
> >
> > A lot of sites consist of a Table of Contents as the front page, with the
> > content in separate files. I've always found it bad for my karma when I
> > click on a very interesting link and end up with a "Data Not Found"
> > message!
> 
> Ultimately, the chances of a file going missing are the same, whether it is 
> the archive or a single file on the site. How is making sure that losses are 
> in bigger chunks going to help, on top of having to wait for longer for one 
> big file to trickle down to your node?
> 
> I do not see this as a viable alternative to proper verification and use of 
> insertion tools. If your files go missing, then use a higher HTL or re-insert 
> more frequently.

Oh joy, more forced reinsertions.
> 
> Regards.
> 
> Gordan

-- 
Matthew J Toseland - [EMAIL PROTECTED]
Freenet Project Official Codemonkey - http://freenetproject.org/
ICTHUS - Nothing is impossible. Our Boss says so.

Attachment: pgp00000.pgp
Description: PGP signature

Reply via email to