I don't know whether these are necessary yet... the first problem solved by 
the first optimisation may be a largish problem...

1. pendingKeys.
===============

Two problems:
A) When we register a big splitfile, it takes ages and goes into Full GC 
constantly for many minutes. Whereas trunk doesn't do this. This is not 
acceptable if freenet is in the background and a game is in the foreground!
B) Every time we fetch a block we need to do a database lookup 
(tripPendingKeys()), and hence disk I/O.

Bloom filters:
- User chooses maximum download queue size.
- Global bloom filter, in RAM, 8MB for 100GB maximum download queue size.
- Each download has an 8MB bloom filter (maybe sparsely encoded), on disk 
only, used when removing a download to recalculate the rest.
- Each download has its own bloom filter, 19 bits per key (0.01% false 
positives), different hash. This is in RAM. Expect 16MB total for 100GB of 
downloads (say 25x4GB ISOs).
- So when we get a key, we check the main filter. If that's positive, we 
iterate all the per-download filters and check them. For each one that 
matches, we iterate its keys (it may have per-segment bloom filters on disk 
to speed this up).
- Registration of splitfile downloads can be greatly speeded up: we need to 
compute a couple of bloom filters in RAM and then save them, but that's it.
- Open question: if we generate the filter and then don't change it, what will 
the impact of blocks we want but have going in and out of the datastore?
- Open question: Cost of removal of a big download.


2. Constant disk I/O from queued downloads
==========================================

The two big causes of database I/O on an otherwise idle node with lots of 
hard-to-find downloads queued are request selection and retries.

Request selection: We choose a request from the queue structure, remove it 
from the queue structure, create a PersistentChosenRequest, store that, and 
add it to the queue of requests to run.

We could eliminate the write part of this, and hope that the read part is in 
RAM.

Naive approach: use a hashmap in RAM to exclude stuff we're already fetching 
(we sort of do this already). Don't remove the requests from the queue 
structure (SendableRequest's). Don't store PersistentChosenRequest's.

Problems with this: tend to load and exclude lots of stuff: increases disk 
reads.

Better approach: drain a single SendableGet (which we do now), then don't 
visit it again unless more has been added to it (which we can tell by a 
generation number or something).

For inserts we will probably have to remove them and update the chosen-from 
object.



Note that request selection is probably the more efficient of the two parts 
anyway... since we fetch up to 50 blocks from a single 
SplitFileFetcherSubSegment, in one transaction we would commit the subsegment 
and the 50 PersistentChosenRequests. For a retry, we need to commit the 
segment (with an incremented retries count for the specific block), the old 
subsegment and the new subsegment (both objects containing a large number of 
blocks in an array of block numbers).

db4o is read-committed (unless you use lazy query eval, which is dangerous), 
so we can't just only commit every so often. Although... if we are selecting 
segment-wise, maybe we could retry segment-wise as well?

Select a whole segment.

All the requests are executed (subject to higher priority non-persistent 
requests getting the slots when they need them.. but eventually they are all 
executed, or a timeout occurs).

In the meantime, request selection is aware of having selected this segment. 
It doesn't choose it again unless it has to. If the queue length goes below 
some value, and no other segments are available at the same priority/retry 
count level, we call the timeout callback prematurely.

We get a callback when all the blocks have failed. (We'd get a different 
callback if only some of them had failed, and there was e.g. a timeout).

We delete the old subsegment, create the new one, move all the block numbers 
across, and commit.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20080718/ba8390db/attachment.pgp>

Reply via email to