infinity0 has recently completed nextgens' work to make the native C FEC codec 
work on 64-bit, with some help from nextgens and sdiz. FEC has always been 
dramatically faster (historically 50X!) in native code than in java code, so 
we need to deploy this soon: it will now work on 32-bit and 64-bit x86, which 
covers just about everyone; sdiz believes the code is endian-safe in which 
case it should be possible to compile it even on ppc etc. In any case, sdiz 
has built the binaries for x86-64 on Windows, we also need binaries (32-bit 
and if it supports it 64-bit) for OS/X.

On a core 2 duo, infinity0's benchmarks show around 700MB/sec encode or decode 
(single threaded), for 8-bit codecs (e.g. our current 128/128). 16-bit codecs 
*should* be around 4 times slower, but this has not yet been tested. Larger 
segments should increase reliability (vivee: how much?). Assuming that 16-bit 
codecs achieve around 175MB/sec, this is very tempting...

The parameters:
- Segment size. The number of data blocks in a segment, which is equal to the 
number of check blocks in a segment. Right now this is 128, the maximum 
achievable on an 8-bit code with 100% redundancy. Some other applications use 
higher redundancy while still having a 128 segment size; it is not 
immediately clear whether this uses an 8-bit code, when I have tried such 
things I have had segfaults or other problems.
- Packet size. The size of the slice from each block which we read when 
decoding or encoding a segment. Currently 4K, although with such small 
segments we should seriously consider reading whole blocks to speed things 
up, it's only 8MB ...
- Acceptable memory usage. I am not certain what the memory overhead is within 
the native codec, beyond the passed-in buffers; the passed-in buffers can be 
allocated from a direct byte buffer, meaning there is no duplication within 
the JVM and the library, but the library itself may allocate additional 
structures - infinity0? nextgens?

The problems/constraints:
- Memory usage will be segment size * 2 * stripe size, plus any structures 
allocated by the library, during decode / encode. Right now this is tiny.
- During decode / encode, we will have to read all the stripes. Assuming 
memory is insufficient, this could be a very large number of seeks - around 
segment size * 2 * 32K / stripe size. This will likely dominate over the CPU 
usage involved! We can avoid seeks on decode by changing the on-disk format, 
but then we have more seeks not only when finding a block (which might be 
acceptable), but on decompressing/streaming to the client (which probably 
isn't).
- If we assume 20ms per seek, this makes (segment size * 2 * (32K / stripe 
size) * 20 / 1000) seconds for a decode... assuming there is no other disk 
I/O, which of course there will be; on the other hand, there is a good chance 
it won't be a full 20ms seek, the blocks should be close together if we're 
lucky.
- In db4o, we select a sub-segment or a segment at a time for doing requests. 
Hence if segments are huge, memory requirements increase, and some database 
jobs take longer.

Decode rate is (segment size in bytes) / (decode time in seconds) =
(segment size * 32K) / (segment size * 2 * (32K / stripe size) * 0.02) =
1 / (2 * 0.02 / stripe size) =
stripe size / 0.04 =
25 * stripe size

So if we want a decode rate of at least 100KB/sec (disk-limited), that means 
stripe size must be no less than 4KB; if we want 200KB/sec, stripe size must 
be no less than 8KB, etc.

Assuming we go for 4K stripe size, segment size is determined by acceptable 
memory usage: segment size = memory usage / (2 * stripe size). So 1024 blocks 
(32MB segments) for 8MB, 2048 blocks (64MB segments) for 16MB, 4096 blocks 
(128MB segments) for 32MB.

1024 seems reasonable to me: 32MB segments instead of 4MB segments, any 
lossy-compressed audio track will be a single segment, an ISO will be 22 
segments (instead of 350!), segment decodes will take around 5 minutes, and 
the effects on db4o are not too bad.

If there is plenty of memory, hopefully the operating system will have cached 
the blocks we have recently downloaded, and decodes will be much faster than 
the above. However, IMHO it is important that Freenet work adequately on 
low-end systems: old recomissioned systems now acting as dedicated servers 
(geeky but we have lots of geeks), quasi-embedded stuff such as fanless 
bittorrent boxes, and so on. Alternatively, we *could* cache all blocks from 
the segment we are currently downloading, in full, in JVM-controlled memory. 
This would maximise decode rates, but would use a lot of memory, and limit 
the possible segment sizes: 1024 blocks is 32MB of data blocks plus 32MB of 
check blocks, so we would have to increase memory requirements by another 
64MB...

Thoughts?

Attachment: signature.asc
Description: This is a digitally signed message part.

_______________________________________________
Devl mailing list
[email protected]
http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl

Reply via email to