[freenet-dev] Splitfiles

Evan Daniel Sat, 13 Feb 2010 22:53:24 -0500

On Sat, Feb 13, 2010 at 7:07 PM, xor <xor at gmx.li> wrote:
> On Sunday 14 February 2010 01:00:16 xor wrote:
>
>>
>> I wonder why you do not want the interleaved scheme for all multi-segment
>> files? Why the arbitrary choice of 80 MiB files?
>>
>> It would suck if then people started to artificially bloat 50MiB files up
>> to 80MiB to improve their success rates...
>
> Oh I guess the answer was in your original message:
>> For files of 20 segments (80 MiB) or more, we move to the
>> double-layered interleaved scheme. I'm working on the interleaving
>> code still (it isn't optimal for all numbers of data blocks yet). The
>> simple segmenting scheme is better for smaller files, and the
>> interleaved scheme for large ones. At 18 segments, the segmentation
>> does better. By 20 segments, the interleaved code is slightly better.
>> By 25 segments, the difference is approaching a 1.5x reduction in
>> failure rates. (Details depend on block success rate. I'll post them
>> on the bug report shortly.)


Yeah, that's the answer.

At 50M, simple segments do better than interleaving.  The 50M simple
segment file is better than either one at 80M.

>
> ... Another question: Will you implement code to dynamically decide based on
> filesize how much amount of interleaving is needed? So that we do not have
> to modify anything even if people start inserting 1 TiB files?
>
> - It doesn't seem wise to have any assumptions on maximal file size as it
> changes over the years.

There are a variety of options here.  This scheme has several things
to recommend it.  The decoder and encoder are very simple; the hard
part is the interleaver.  Depending on what we decide for the metadata
format, it's entirely possible to structure it so we can change the
interleaving but still work with old decoders.  That would require
storing the segment layout, rather than simply "compute segment layout
for n blocks using scheme number x" and counting on the decoder having
an implementation of scheme x available.  That's not actually that big
a penalty, though.  Worst case if we don't do anything clever about
packing it efficiently it adds about 12B of metadata per data block;
with careful compression I think it's ~ 4B/data block (we spend 138B
per data block just storing the CHK URIs, though we could reduce that
to 64).

So my current recommendation is that I'll produce an interleaver
scheme that will be better than simple segments for all files > ~80M,
and will slowly degrade with very large files (I'm not sure where that
boundary is, but even as the files get very large it will outperform
simple segments by a huge margin).  Then we store the full
interleaving pattern as metadata.  That makes the upgrade path very
smooth when we later decide that huge files (1 TiB? Bigger?) are an
issue.

Evan Daniel

[freenet-dev] Splitfiles

Reply via email to