[freenet-devl] no client metadata in 0.4 CHKs?

Brandon Tue, 8 May 2001 00:41:10 -0500 (CDT)

> On Mon, May 07, 2001 at 11:07:51PM -0500, Brandon wrote:
> > There is no expense in requesting an additional file if the files are
> > requested simultaneously, at least no expense worth mentioning.
>
> There is if one of them has been dropped -


If the file is dropped then, you know, the file is dropped. That happens
all the time and it is not worth noting. If the metadata is dropped then
you don't have the files metadata. The case where one is dropped and the
other isn't shouldn't happen since they will have exactly the same
popularity.

> and how will you know what
> metadata file to request before you have requested the first file?

We've explained this two or three times. You embed the key for the
metadata file in the key for the file. So if you have the key, you can
instantly request both.

> Also, what is to stop client writers just embedding the metadata in the
> document themselves?  That is what i'd do - and that doesn't solve the
> problem you outline.

Client writers won't do that because there is a predefined standard for
how to do this which will work just fine (putting the metadata in a
separate file).

> > The first problem doesn't seem like a real practical problem (who's going
> > to use this attack and what purpose will it serve?) but still seems pretty
> > dumb and should probably be avoided.
>
> You are right, it is dumb, and so I don't even see why you mentioned it,
> it doesn't serve your argument.

The decision to split metadata into a separate file evolved from a lot of
discussion. The reason that we can't just hash the data and not hash the
metadata is the above. I think it's dumb, too. I included it because some
people seem to think it's important and so we have to solve it when we're
desiging in a DMI solution. You can argue the importance of this
particular point separately with the people that think it's important.

> As we both agree, the first problem isn't a problem.   Further, I don't
> think that the second problem is really all that serious either, can you
> outline why this would be such a bad thing?  Even if there were two
> files that were similar but not idential, and thus they were stored
> under 2 CHKs, requests for each would be divided between the two, so
> they wouldn't be any more widely propogated than were they the same
> file.  Of course they might fall out faster, but I don't think that is a
> serious consideration compaired to the annoyance of trying to separate
> out metadata.

The case of two similar files is an outlying case that we need not worry
about. The case of two identical files with separate metadata will be very
prevalent and annoying as soon as people start uploading lots of works
into Freenet. People will tend to upload everything that they think should
go into Freenet without checking to see if it's already there. Each
uploader will fill out the metadata differently. So each version of the
file will be separate. Every popular essay, book, movie, and MP3 will have
multiple copies inserted once Freenet becomes widely used. I think that
the majority of Freenet content by size will be duplicated. Freenet will
be less reliable because 1) there is less space because of the duplicated
items so more things are falling out, and 2) the various versions will
split the popularity score. The second is even worse than it first
appears. The more popular (actually, not by score) something is, the more
versions will be inserted. The more versions that are inserted, the more
its popularity score is divided. So as things become more popular, their
score becomes less and less a reflection of their true popularity, messing
everything up.

Separating out metadata actually simplifies things quite a bit. You don't
need a data and metadata section for a file. A file can just be a data
file or a metadata file. You can get rid of a header and some parsing. I
don't think it's an annoyance at all. All you have to add codewise is the
ability to parse the metadata key out of the data key. I readily volunteer
to do this so that no one needs to be annoyed.

> > This can be done without increasing request time by putting the key for
> > the metadata file in the CHK.

> This would work, but it is a slippery slope, and as I have noted, is
> hardly nescessary - why not just put *all* the data in the CHK and
> forget about Freenet altogether?!

It's not a slippery slope. Putting, say, the content type in the CHK would
be a symptom of feature creep. Putting the metadata CHK in the key is
logical because currently files consist of two parts with the break
determined by a storable field. We're reorganizing the file structure to
fix some problems and to simplify things and moving the break designator
from the storable field to the key, which in this case is the only logical
place for it to go.

A key should contain everything you need to get a file and understand it.
The metadata of the file is part of that. So if the file itself doesn't
know where the metadata is, then the key is the appropriate place to put
it.



_______________________________________________
Devl mailing list
Devl at freenetproject.org
http://lists.freenetproject.org/mailman/listinfo/devl

[freenet-devl] no client metadata in 0.4 CHKs?

Reply via email to